The race must go on

Paper 2D · Pødenphant Lund (2026) · Read on Zenodo

Telling a model what the information is for, before you give it, changes when the model commits.If you tell a language model what you will ask it to do before you hand over the facts, the model does not do less work. It does the same work, just at a different moment. The hard part of generating an answer moves earlier in the response, but it never disappears. You cannot frame the work away. You can only move it. That is what the title means: the race must go on.

The one-sentence version

When a language model writes, every word is the finish of a small race between candidate next-words. Sometimes one candidate is the obvious winner and the race is over before it starts. Sometimes several candidates are close and the race is tight. How you frame the task you are about to give the model shifts when that race happens during the answer. The frame repositions the competition; it does not remove it.

What "the race" is

Behind every word a model produces is a competition. The model holds several candidate continuations at once, and the one that gets out first wins that position. We can measure how tight the competition is by counting how many candidates are still seriously in contention at each step. A high count means the model is still deciding; a low count means it has already settled. I call that the model's route-competition, and I read it straight off the model's own output.

The two kinds of hint

There are two ways of framing a block of facts before the model reads them:

An input-hint — telling the model what kind of data is coming ("these are chemistry facts").
A purpose-hint — telling the model what it will be asked to do with the data ("soon you will compute a property by following a chain of steps").

The purpose-hint is the one that does real work. The input-hint mostly does nothing, because the data usually announces its own type already. The model can see for itself what kind of thing it is reading. What it cannot see on its own is what you intend to ask. That missing piece is exactly what the purpose-hint supplies.

The headline result: it survives instruction-tuning's fingerprint

The cleanest finding is a difference between two versions of the same model. A raw base model (one that has only been trained to predict text) shows the purpose-hint effect clearly. The instruction-tuned version of the same model (the polished, chatty kind you usually talk to) shows it much more weakly, often not at all. This held across three different model families.

That pattern matters for one specific reason. If the effect only appeared after the chatbot training, you might suspect the training had invented it. It is the other way around. The effect is there in the raw model and gets flattened by the polishing. The most likely reason is that the polishing step squeezes the candidate-competition tighter, so the signal that was easy to read in the raw model becomes hard to see in the polished one.

Moving the work, not removing it

You might expect a good hint to reduce the total amount of competition the model goes through. It does not. What a purpose-hint does is move the peak of the competition earlier in the response, by a couple of words. The model opens the race while it is still reading the hint, rather than later when it is forming the answer. The total amount of work stays roughly the same. The model has to commit to specific words somewhere; the frame only changes where.

The direction of the shift even depends on the task. When the model genuinely does not yet know what it will be asked, the purpose-hint lowers the early competition, because it pre-loads the relevant routes. When the question already spells out the task completely, the same hint raises early competition, because now it is redundant and just adds a second way to read the opening.

Why two hints are worse than one

Stacking an input-hint and a purpose-hint together never beats the best single hint. At first this looked like the two hints interfering with each other. A careful control showed something simpler and more general: every extra word in the preamble opens its own small races that the model has to settle before it starts answering. Longer preamble, more total work. The cost comes from length, not from the two hints clashing. The practical lesson is old pedagogy in new clothing: keep the setup short.

What this is good for

For anyone writing prompts. The single most reliable move is to tell the model what you want the information used for, and to say it before the data, not after. Put the purpose first. One well-formed purpose-hint beats stacking several hints, and beats announcing the data type. Placed after the facts, the same hint does almost nothing, because by then the model has already started its race.

For understanding learning in general. The result lines up with a long tradition in psychology: you remember and use information best when the way you took it in matches the way you later need it. Here that principle shows up at the level of the machine, where we can watch exactly when the matching happens.

The cite

Pødenphant Lund, T. (2026). The Race Must Go On: Encoding-Frames Reposition Route-Competition Onset. Zenodo. https://doi.org/10.5281/zenodo.20562084

Read on Zenodo → · Technical version · Dansk version

Related on this site:

Paper 2 (Capacity Scaling) — how model size shapes where these frame-effects can appear at all.
Paper 2B (ICL vs Fine-tuning) — the route-compression under training that explains why the signal fades on instruction-tuned models.
Paper 16 (Physics of Learning) — the general "more setup costs more" principle, developed for teaching and communication.
Paper 1 (Friction Theory) — the race-substrate framework this paper measures on.