Why we value what we worked for — and when we commit to an answer

Paper 6BC · Pødenphant Lund (2026r) · Read on Zenodo

I study language models to understand people.You value the shelf you assembled yourself more than the identical one someone assembled for you. Ask a language model when it will commit to an answer, and a base model spreads its choice point 3.4 times wider than the RLHF-trained version of the same model. Two completely different observations. This article proposes that both are readouts of the same thing: a substrate running a race between competing candidates, leaving traces behind when it settles the race. It is a programmatic proposal, not a finished result. I put forward two traces that one might be able to measure from a language model's output alone, and an experimental programme for each, which it would have to pass to become a real finding.

The puzzle: two patterns we keep seeing

The first pattern runs through decades of behavioural research. People value things they worked for themselves more than identical things they got for free. They overvalue what they own. They keep throwing money after lost causes. They remember what they generated themselves better than what they were shown. They explain their own effort to themselves by deciding the outcome must be worth it. They use felt effort as a shortcut for quality.

Six different effects: the IKEA effect, the endowment effect, the sunk-cost fallacy, the generation effect, effort justification, and the effort heuristic. Each has its own specialists, its own textbook explanation, its own paradigm. None is currently classified as a variant of any other.

The second pattern is newer. If you give a language model an open task and let it commit to an answer at any point in its response, where exactly does it commit? On a base model (one that has not been polished with RLHF) the point where it settles on an answer shifts dramatically across conditions and lands close to a classic mathematical optimum from optimal-stopping theory. On the instruction-tuned version of the same model, fine-tuned on identical content, that point barely moves, and the link to the mathematical optimum is gone. Two models with the same knowledge, very different patterns of when they settle.

The proposal

The article proposes that both patterns are readouts of the same underlying machinery: a substrate running races between candidate outcomes under limited time and energy. The race leaves traces in the substrate. You can read those traces in two different ways.

Readout 1: traces from earlier races (effort-value)

When a substrate settles a race under limited resources, the decision leaves a hysteresis trace: a memory of the race the substrate had to run. Friction invested in the race deepens the trace. The trace then carries more weight in later comparisons between that outcome and others.

Why do you value the shelf you assembled yourself more than the identical pre-assembled one? Because the assembly process ran a race inside you between competing strategies (which screw goes where, which way the manual points, why it won't click into place). The race left a trace. When you later compare your shelf with the pre-assembled one in your judgement, the trace makes its case in the comparison.

Six classic biases share this race mechanic. The article is careful about scope. It does not claim the race mechanic explains all six effects under all conditions. It claims the race mechanic explains an effort-essential subset of these effects. Endowment-effect variants where you instruct subjects "imagine you own X" without any actual effort are not explained by this framework; sunk-cost variants where the commitment was made without effort are not either. Those variants exist; the article does not deny them. The race mechanic is offered as one component of the family, not as the whole story.

Readout 2: where it settles (the point where it settles on an answer)

Any race architecture has to commit at some point. The moment of commitment is itself information about the substrate. Does it commit early? Late? Does the point where it settles move across conditions? Does that point track recognition state, or has it become decoupled?

A preliminary experiment on a base language model versus its instruction-tuned counterpart, both fine-tuned on the same content:

Important honesty here. The numerical match between the point where the base model settles and the secretary problem's optimum 1/e is interesting, but the article records it as a case to be replicated, not a finding. A single-cell numerical match without independent replication is not yet evidence for a substrate property. What is reproducible at the directional level is the base-vs-instruct asymmetry: that the base model has a wider, more condition-sensitive distribution of when it settles than the trained instruct model. That is what the article claims.

What the article carefully does not claim

The article is unusually disciplined about what it does and does not establish. The honest list:

What the article does claim: that two specific substrate signatures (trace dominance in comparison-judgement races, commit position in the answer trajectory) can be measured from logprobs alone on language-model substrates, and that the measurement programme each one requires can be specified concretely. The contribution is a research direction with an empirical pipeline, not a finished result.

Why it matters

For behavioural science. If the race-mechanic component is right, six biases that look distinct on the surface share substrate architecture underneath. The grade-(b) prediction (an inverted U over effort intensity, where the bias vanishes at trivial difficulty and again at overwhelming difficulty) is testable on each of the six biases independently. The IKEA-effect dissipation boundary (Norton et al. 2012) is already in the data, just not explained by the native account. The same boundary should show up in the other five.

For AI/LLM evaluation. The point where a model settles on an answer is a substrate signature you can read from logprobs alone, for free, on any model that returns them. The base-vs-instruct difference in how disciplined that settling is is a substrate effect of RLHF that needs no behavioural benchmarking to measure. It is there in the model's own output trajectory. Other models, other training regimes, other RLHF variants should produce different signatures of where they settle. The article is an invitation to measure them.

For Friction Theory. The two readouts make the race architecture (Paper 1) empirically falsifiable in a new way. If the trace-dominance signature fails to show up in the six effort-value biases under the effort-essential subset, the race account is in trouble. If the signature for where it settles fails to replicate across cells, the substrate-discipline framework is in trouble. Both are testable predictions on which the framework's claim to be "substrate-level" depends.

The cite

The full paper is open-access on Zenodo. Concept DOI:

Pødenphant Lund, T. (2026r). Two Candidate Readouts of a Proposed Common Race: Effort-Value Attribution and Commit-Position as Substrate Signatures of Race-Architecture. Zenodo. https://doi.org/10.5281/zenodo.20339431

Read on Zenodo → · Technical version · Dansk version

Related on this site: