Why we value what we worked for — and when we commit to an answer
Paper 6BC · Pødenphant Lund (2026r) · Read on Zenodo
I study language models to understand people.You value the shelf you assembled yourself more than the identical one someone assembled for you. Ask a language model when it will commit to an answer, and a base model spreads its choice point 3.4 times wider than the RLHF-trained version of the same model. Two completely different observations. This article proposes that both are readouts of the same thing: a substrate running a race between competing candidates, leaving traces behind when it settles the race. It is a programmatic proposal, not a finished result. I put forward two traces that one might be able to measure from a language model's output alone, and an experimental programme for each, which it would have to pass to become a real finding.
The puzzle: two patterns we keep seeing
The first pattern runs through decades of behavioural research. People value things they worked for themselves more than identical things they got for free. They overvalue what they own. They keep throwing money after lost causes. They remember what they generated themselves better than what they were shown. They explain their own effort to themselves by deciding the outcome must be worth it. They use felt effort as a shortcut for quality.
Six different effects: the IKEA effect, the endowment effect, the sunk-cost fallacy, the generation effect, effort justification, and the effort heuristic. Each has its own specialists, its own textbook explanation, its own paradigm. None is currently classified as a variant of any other.
The second pattern is newer. If you give a language model an open task and let it commit to an answer at any point in its response, where exactly does it commit? On a base model (one that has not been polished with RLHF) the point where it settles on an answer shifts dramatically across conditions and lands close to a classic mathematical optimum from optimal-stopping theory. On the instruction-tuned version of the same model, fine-tuned on identical content, that point barely moves, and the link to the mathematical optimum is gone. Two models with the same knowledge, very different patterns of when they settle.
The proposal
The article proposes that both patterns are readouts of the same underlying machinery: a substrate running races between candidate outcomes under limited time and energy. The race leaves traces in the substrate. You can read those traces in two different ways.
Readout 1: traces from earlier races (effort-value)
When a substrate settles a race under limited resources, the decision leaves a hysteresis trace: a memory of the race the substrate had to run. Friction invested in the race deepens the trace. The trace then carries more weight in later comparisons between that outcome and others.
Why do you value the shelf you assembled yourself more than the identical pre-assembled one? Because the assembly process ran a race inside you between competing strategies (which screw goes where, which way the manual points, why it won't click into place). The race left a trace. When you later compare your shelf with the pre-assembled one in your judgement, the trace makes its case in the comparison.
Six classic biases share this race mechanic. The article is careful about scope. It does not claim the race mechanic explains all six effects under all conditions. It claims the race mechanic explains an effort-essential subset of these effects. Endowment-effect variants where you instruct subjects "imagine you own X" without any actual effort are not explained by this framework; sunk-cost variants where the commitment was made without effort are not either. Those variants exist; the article does not deny them. The race mechanic is offered as one component of the family, not as the whole story.
Readout 2: where it settles (the point where it settles on an answer)
Any race architecture has to commit at some point. The moment of commitment is itself information about the substrate. Does it commit early? Late? Does the point where it settles move across conditions? Does that point track recognition state, or has it become decoupled?
A preliminary experiment on a base language model versus its instruction-tuned counterpart, both fine-tuned on the same content:
- The point where the base model settles on an answer spreads 3.4 times wider across conditions than the instruct model's. RLHF has, on this reading, tightened how disciplined the model's settling is: it has been given a narrower range of when-should-I-commit positions.
- The point where the base model settles drifts away from the secretary problem's optimum (1/e ≈ 0.368) as the task interpretation deepens. The base model seems to "feel its way" to the right moment to settle and shift it as the interpretation grows richer.
- The base model shows a tight coupling between recognition and settling on an answer (correlation r = 0.528). The instruct model does not (r = 0.104). RLHF appears to have decoupled the decision to settle from the substrate's underlying recognition state.
Important honesty here. The numerical match between the point where the base model settles and the secretary problem's optimum 1/e is interesting, but the article records it as a case to be replicated, not a finding. A single-cell numerical match without independent replication is not yet evidence for a substrate property. What is reproducible at the directional level is the base-vs-instruct asymmetry: that the base model has a wider, more condition-sensitive distribution of when it settles than the trained instruct model. That is what the article claims.
What the article carefully does not claim
The article is unusually disciplined about what it does and does not establish. The honest list:
- It does not claim a unification of effort-value biases. The race mechanic is one component of an effort-essential subset; the full picture for any single bias involves further mechanisms (signalling, cognitive dissonance, ownership-as-self-extension, and so on) that the framework does not displace.
- It does not claim a validated quantitative match to 1/e. The single-cell case is recorded for replication, not declared a result.
- It does not derive loss aversion from first principles. There is a speculative coda offering a "soft-irrevocability-reversal-cost" reading, and the article labels it explicitly as a speculation that no new evidence supports.
- It does not replace any native treatment of the six biases. The native vocabularies (Ariely, Kahneman, Slamecka, Aronson, Kruger) stay valid where their conditions hold; the race mechanic adds an explanation for the effort-essential subset.
What the article does claim: that two specific substrate signatures (trace dominance in comparison-judgement races, commit position in the answer trajectory) can be measured from logprobs alone on language-model substrates, and that the measurement programme each one requires can be specified concretely. The contribution is a research direction with an empirical pipeline, not a finished result.
Why it matters
For behavioural science. If the race-mechanic component is right, six biases that look distinct on the surface share substrate architecture underneath. The grade-(b) prediction (an inverted U over effort intensity, where the bias vanishes at trivial difficulty and again at overwhelming difficulty) is testable on each of the six biases independently. The IKEA-effect dissipation boundary (Norton et al. 2012) is already in the data, just not explained by the native account. The same boundary should show up in the other five.
For AI/LLM evaluation. The point where a model settles on an answer is a substrate signature you can read from logprobs alone, for free, on any model that returns them. The base-vs-instruct difference in how disciplined that settling is is a substrate effect of RLHF that needs no behavioural benchmarking to measure. It is there in the model's own output trajectory. Other models, other training regimes, other RLHF variants should produce different signatures of where they settle. The article is an invitation to measure them.
For Friction Theory. The two readouts make the race architecture (Paper 1) empirically falsifiable in a new way. If the trace-dominance signature fails to show up in the six effort-value biases under the effort-essential subset, the race account is in trouble. If the signature for where it settles fails to replicate across cells, the substrate-discipline framework is in trouble. Both are testable predictions on which the framework's claim to be "substrate-level" depends.
The cite
The full paper is open-access on Zenodo. Concept DOI:
Read on Zenodo → · Technical version · Dansk version
Related on this site:
- Paper 6 core (Matched Friction Under Hysteresis) — the schema paper. Paper 6BC instantiates two of the schema's specific signatures.
- Paper 1 (Friction Theory) — the race axioms (R1–R3) the two readouts build on.
- Paper 4B (Substrates Encode Experience) — encoding-through-loading. Readout 1's trace dominance reads what 4B's encoding leaves behind.
- Paper 13 (Operational Friction Theory) — race opening, where it settles, manifested behaviour. Readout 2's point where the model settles is the empirical handle on Paper 13's race-opening structure.
- Paper 2B (ICL/FT memory) — the working-memory / long-term-memory distinction. Paper 6BC's base-vs-instruct asymmetry in when the model settles is consistent with 2B's prediction that fine-tuning compresses the calibrated distribution.