The Race Must Go On: Encoding-Frames Reposition Route-Competition Onset

Paper 2D · Pødenphant Lund, T. (2026) · Preprint · Live on Zenodo

How you frame an encoding task repositions when route-competition (the per-token race) opens during subsequent generation. The frame moves the onset of competition; it does not remove it. The substrate must still commit to specific routes at some point, so the only manipulation available is timing — the race must go on.

DOI (concept)10.5281/zenodo.20562084
StatusPreprint, live on Zenodo (2026-06-17)
AuthorTomas Pødenphant Lund [ORCID]

TL;DR

At each generation step a transformer commits to one continuation token: the operation is a race between candidate routes, where the winning route receives logit mass while losers are suppressed. We measure route-competition via the count of competing routes (CR) — tokens whose probability exceeds a 0.10 threshold — read directly off the output logprobs.

We test whether an encoding-frame placed before a fact-block shifts that competition during subsequent generation, which frame-types are effective on which substrates, and whether combined frames are additively beneficial. The battery ran across six substrates spanning a capacity gradient (Qwen-0.5B through Llama-3.3-70B, plus a Llama-3.1-8B base/instruct pair), under five frame conditions, with length-matched controls, threshold-sensitivity analysis, and BH-FDR correction.

The headline result is a base-vs-instruct asymmetry: a purpose-frame ("you will soon be asked to compute X") produces a significant first-token CR effect on a non-RLHF base model that is null on the matched RLHF instruct, replicated across three model families (Llama-3.1-8B, Mistral-7B-v0.3, Gemma-2-9b). The effect is therefore not RLHF-created; the parsimonious reading is that instruction-tuning compresses the competing-route distribution. The frame repositions onset competition rather than reducing it by a fixed amount; the direction is task-dependent. Practical recommendation: tell the model what to use the information for, placed before the data.

The race-substrate model and the CR metric

The substrate-mechanism perspective treats each generation step not as a single softmax sampling event but as the outcome of a competitive race between candidate routes: each route is a substrate-internal continuation hypothesis that accrues activation as the prefix unfolds, and at decision time the activations are softmax-normalised and the substrate commits to the winner. CR > 1 indicates that more than one route is still substantively in contention at the decision boundary; CR = 1 indicates the substrate has resolved to a single dominant route. The headline directional claims are metric-independent: recomputing the central contrast on Shannon entropy and on effective-competitors gives an identical condition-ranking.

An encoding-frame is a short preamble placed before a fact-block, intended to pre-allocate substrate activation toward routes the frame predicts will be relevant. It is not just "context": the substrate must process the frame tokens during its own decoding, and that engagement persists as a residual route-distribution shift that biases subsequent races.

The five frame conditions

All five preambles precede an identical 20-fact block and an identical test question; only the preamble varies. The substrate sees identical task-content across conditions.

Headline: instruction-tuning compresses frame-readable route-competition

On the chain task, a purpose-frame lowers first-token CR on Qwen-7B (B_output − D_neutral = −0.60, paired-bootstrap CI [−0.83, −0.37]). The cleaner claim closes the RLHF confound. On a non-RLHF Llama-3.1-8B base model the onset effect is significant (−0.20 SIG), if anything stronger than on the matched instruct. An accuracy-matched in-band cloze-completion battery (HF bf16, n=50/condition, base accuracy 0.88 vs instruct 0.86) reproduces the asymmetry on the accuracy axis as well: the purpose-frame onset effect is significant on the base (+0.20 SIG) and null on the instruct (+0.02 n.s.).

The asymmetry replicates across three model families, each base-significant and matched-instruct-null:

Llama-3.1-8Bbase +0.20 SIG · instruct +0.02 n.s.
Mistral-7B-v0.3base +0.54 SIG · instruct −0.12 n.s.
Gemma-2-9bbase +0.54 SIG · instruct +0.06 n.s.

An RLHF artifact would surface on the instruct, not the base. The base-significant / instruct-null signature is the one pattern that rules out RLHF-creation. The most parsimonious reading is that instruction-tuning compresses the competing-route distribution, leaving the onset signal readable on the base and quenched on the instruct — which converges with independent measurement in published companion work (route-collapse under fine-tuning; the OLMo base→SFT→DPO→instruct gradient; instruct-saturation versus base de-saturation). What is substrate-intrinsic is the presence of the base-model effect and its absence on the instruct; the sign is task-dependent (lower onset CR when the downstream operation is unknown, higher when the stem already names it).

Race-positioning: the frame moves the work, it does not remove it

A naive reading of "frames help encoding" predicts lower total friction under frame conditions. We find instead that frames reschedule substrate work. Purpose- and combined-frames move the friction peak roughly 2–4 token positions earlier (on Qwen-7B from token 8 to token 4; on Llama-70B from mean 9.03 to 5.30), opening the race during preamble-integration rather than during answer-formation. The total task of generating a correct answer is approximately conserved. This is the literal sense of the title: the substrate must commit to specific routes at some point, and the only available manipulation is timing.

Input-frames are null on self-describing data: the parsing meta-race ("what is this?") is resolved by the data format itself, while the target meta-race ("what is the task?") cannot be resolved by data alone, so only the purpose-frame supplies missing information.

Combined frames: length-driven race-spawning, not stacking interference

Combined frames never beat the best single frame and raise total CR. The original reading attributed this to stacking-specific inter-frame interference. A length-matched control settled it: with preambles matched to ~28 words, the combined frame is no costlier than a length-matched single frame (C ≈ B, Δ +0.07 n.s.). The cost is a lawful consequence of race architecture — every preamble token opens races that must be resolved before commitment — consistent with extraneous cognitive load (Sweller) and lost-in-the-middle (Liu et al. 2023). The prompt-engineering implication survives intact: keep preambles short.

Capacity-modulated and architecture-conditional effects

Connections to other papers in the series

Read the paper

The full paper is on Zenodo (concept DOI 10.5281/zenodo.20562084):

Pødenphant Lund, T. (2026). The Race Must Go On: Encoding-Frames Reposition Route-Competition Onset. Zenodo. https://doi.org/10.5281/zenodo.20562084

Read on Zenodo → · Plain English version · Dansk version