The Race Must Go On: Encoding-Frames Reposition Route-Competition Onset
Paper 2D · Pødenphant Lund, T. (2026) · Preprint · Live on Zenodo
How you frame an encoding task repositions when route-competition (the per-token race) opens during subsequent generation. The frame moves the onset of competition; it does not remove it. The substrate must still commit to specific routes at some point, so the only manipulation available is timing — the race must go on.
| DOI (concept) | 10.5281/zenodo.20562084 |
| Status | Preprint, live on Zenodo (2026-06-17) |
| Author | Tomas Pødenphant Lund [ORCID] |
TL;DR
At each generation step a transformer commits to one continuation token: the operation is a race between candidate routes, where the winning route receives logit mass while losers are suppressed. We measure route-competition via the count of competing routes (CR) — tokens whose probability exceeds a 0.10 threshold — read directly off the output logprobs.
We test whether an encoding-frame placed before a fact-block shifts that competition during subsequent generation, which frame-types are effective on which substrates, and whether combined frames are additively beneficial. The battery ran across six substrates spanning a capacity gradient (Qwen-0.5B through Llama-3.3-70B, plus a Llama-3.1-8B base/instruct pair), under five frame conditions, with length-matched controls, threshold-sensitivity analysis, and BH-FDR correction.
The headline result is a base-vs-instruct asymmetry: a purpose-frame ("you will soon be asked to compute X") produces a significant first-token CR effect on a non-RLHF base model that is null on the matched RLHF instruct, replicated across three model families (Llama-3.1-8B, Mistral-7B-v0.3, Gemma-2-9b). The effect is therefore not RLHF-created; the parsimonious reading is that instruction-tuning compresses the competing-route distribution. The frame repositions onset competition rather than reducing it by a fixed amount; the direction is task-dependent. Practical recommendation: tell the model what to use the information for, placed before the data.
The race-substrate model and the CR metric
The substrate-mechanism perspective treats each generation step not as a single softmax sampling event but as the outcome of a competitive race between candidate routes: each route is a substrate-internal continuation hypothesis that accrues activation as the prefix unfolds, and at decision time the activations are softmax-normalised and the substrate commits to the winner. CR > 1 indicates that more than one route is still substantively in contention at the decision boundary; CR = 1 indicates the substrate has resolved to a single dominant route. The headline directional claims are metric-independent: recomputing the central contrast on Shannon entropy and on effective-competitors gives an identical condition-ranking.
An encoding-frame is a short preamble placed before a fact-block, intended to pre-allocate substrate activation toward routes the frame predicts will be relevant. It is not just "context": the substrate must process the frame tokens during its own decoding, and that engagement persists as a residual route-distribution shift that biases subsequent races.
The five frame conditions
- A_input — announces the data type ("these are facts about chemical substance transformations").
- B_output — the purpose-frame, announcing the upcoming task ("soon you will be asked to compute the property of a derived substance through chained transformations").
- C_combined — A and B concatenated.
- D_neutral — baseline ("here are some facts you may be tested on").
- E_mismatched — announces a different task than the one actually posed.
All five preambles precede an identical 20-fact block and an identical test question; only the preamble varies. The substrate sees identical task-content across conditions.
Headline: instruction-tuning compresses frame-readable route-competition
On the chain task, a purpose-frame lowers first-token CR on Qwen-7B (B_output − D_neutral = −0.60, paired-bootstrap CI [−0.83, −0.37]). The cleaner claim closes the RLHF confound. On a non-RLHF Llama-3.1-8B base model the onset effect is significant (−0.20 SIG), if anything stronger than on the matched instruct. An accuracy-matched in-band cloze-completion battery (HF bf16, n=50/condition, base accuracy 0.88 vs instruct 0.86) reproduces the asymmetry on the accuracy axis as well: the purpose-frame onset effect is significant on the base (+0.20 SIG) and null on the instruct (+0.02 n.s.).
The asymmetry replicates across three model families, each base-significant and matched-instruct-null:
| Llama-3.1-8B | base +0.20 SIG · instruct +0.02 n.s. |
| Mistral-7B-v0.3 | base +0.54 SIG · instruct −0.12 n.s. |
| Gemma-2-9b | base +0.54 SIG · instruct +0.06 n.s. |
An RLHF artifact would surface on the instruct, not the base. The base-significant / instruct-null signature is the one pattern that rules out RLHF-creation. The most parsimonious reading is that instruction-tuning compresses the competing-route distribution, leaving the onset signal readable on the base and quenched on the instruct — which converges with independent measurement in published companion work (route-collapse under fine-tuning; the OLMo base→SFT→DPO→instruct gradient; instruct-saturation versus base de-saturation). What is substrate-intrinsic is the presence of the base-model effect and its absence on the instruct; the sign is task-dependent (lower onset CR when the downstream operation is unknown, higher when the stem already names it).
Race-positioning: the frame moves the work, it does not remove it
A naive reading of "frames help encoding" predicts lower total friction under frame conditions. We find instead that frames reschedule substrate work. Purpose- and combined-frames move the friction peak roughly 2–4 token positions earlier (on Qwen-7B from token 8 to token 4; on Llama-70B from mean 9.03 to 5.30), opening the race during preamble-integration rather than during answer-formation. The total task of generating a correct answer is approximately conserved. This is the literal sense of the title: the substrate must commit to specific routes at some point, and the only available manipulation is timing.
Input-frames are null on self-describing data: the parsing meta-race ("what is this?") is resolved by the data format itself, while the target meta-race ("what is the task?") cannot be resolved by data alone, so only the purpose-frame supplies missing information.
Combined frames: length-driven race-spawning, not stacking interference
Combined frames never beat the best single frame and raise total CR. The original reading attributed this to stacking-specific inter-frame interference. A length-matched control settled it: with preambles matched to ~28 words, the combined frame is no costlier than a length-matched single frame (C ≈ B, Δ +0.07 n.s.). The cost is a lawful consequence of race architecture — every preamble token opens races that must be resolved before commitment — consistent with extraneous cognitive load (Sweller) and lost-in-the-middle (Liu et al. 2023). The prompt-engineering implication survives intact: keep preambles short.
Capacity-modulated and architecture-conditional effects
- Mismatched-frame reactance. A mismatched-purpose frame delays the race on Llama-70B (+2.33 positions, sig) and costs accuracy on Qwen-3B (−5pp SIG) — cross-axis reactance, capacity-modulated.
- Chunking is conditional, not a flat null. Type-naming preambles are inert, but real data-reorganization improves accuracy specifically on capacity-constrained substrates — a state-space model (+16.7pp) and a small transformer at its competence edge (+9.5pp) — on integration tasks: an architecture × task × capacity dissociation.
- Frame position matters. A purpose-frame resolves the onset race only when it precedes the data (−0.81 SIG); placed after the fact-block it does not (+0.19). Once the data has been processed the onset race has already fired.
- Inverse-U over capacity × task (offered explicitly as an exploratory hypothesis, not a validated result): frame-effectiveness is null at floor, observable in a sweet spot, and null at ceiling; prospective validation is left to a companion paper.
Connections to other papers in the series
- Paper 2 (Capacity Scaling) — the race-architecture-floor at high capacity; the 70B first-token CR floor reflects the same RLHF compression.
- Paper 2B (ICL vs Fine-tuning) — winner-route amplification; the route-compression under training that makes the base/instruct asymmetry expected.
- Paper 16 (Physics of Learning) — develops the length-as-race-cost corollary as a general communication principle, with cross-substrate predictions.
- Paper 1 (Friction Theory) — the substrate-universal race-axioms and the encoding-through-loading anchor this paper measures on.
Read the paper
The full paper is on Zenodo (concept DOI 10.5281/zenodo.20562084):