The model has already half-decided by its first word

Paper 4D · Pødenphant Lund (2026) · Read on Zenodo

An AI model decides which way it is going before it writes a single full word.A language model writes its answer one word at a time, left to right. You might assume the real decision happens somewhere in the middle, as it works things out. It does not. By the very first token it produces, the model has largely chosen which route to the answer it will take. And the striking part is that you can read, right there at that first token, how firmly it has committed. The model has already half-decided by its first word, and the decision is sitting in plain sight if you know where to look.

How you read commitment at the first word

Every time the model writes a word, it is really choosing among several candidate next words, each with a probability. Most of the time you only see the winner. But you can also count how many serious candidates were still in the running at that moment. That count is what I call the competing routes at the first token.

The reading is simple. If only one candidate carries real weight, the count is 1: the model has committed, one route, no contest. If two or more candidates are still live, the count is 2 or higher: the model is still holding its options open. Read at the very first token, that count tells you whether the model has already made up its mind or is still deciding.

One detail matters. If you average this count across the whole response, the signal disappears. The commitment happens at the onset and is gone within a word or two, so a whole-answer average shows nothing while the first token shows the whole story. You have to look at exactly the right spot.

Two dials that control how committed it is

Once you can read commitment at the first word, the natural question is what controls it. Two separate dials do.

Training depth sets how committed it is

The more deeply a behaviour has been trained into the model's weights, the more committed that first word becomes, and the harder it is for any later instruction to move it. So the first-word reading doubles as a measure of how trained-in a habit is: a deeply trained route shows up as a committed onset that no amount of prompting can re-open.

Pressure-words move how committed it is

Here is a puzzle. Practitioners know that telling a model "take your time, think it over" makes it reason more carefully, and "answer immediately, go with your first instinct" makes it snap to an answer. But the model has no clock. There is no real time to take. The words should not do anything, and yet they clearly do.

The answer is that the pressure-words act like a temperature dial on the model's decision. Low pressure ("take your time") keeps the competing routes open at the first word long enough for the model to settle into a better answer. High pressure ("answer now") forces an immediate commitment. The whole effect is carried by the meaning of the words, and it is visible exactly where you would expect: at the first token, the competing-routes count jumps from about 2.4 under low pressure down to 1 under high pressure.

When the pressure dial helps, and when it backfires

Giving the model room to deliberate is not always good. On a hard multi-step reasoning task, "take your time" raised accuracy substantially on capable models. But on a task too hard for the model to solve, the extra room only let it talk itself out of a lucky guess, so deliberation hurt. And on a quick one-step task, the opposite held: "answer on instinct" won, because the job was the snap decision, not a long derivation. Same words, opposite effect, depending on what the task actually needs.

The cleanest demonstration

The two dials separate beautifully in one experiment. The researchers installed the same habit in a model in two ways: once as a prompt instruction (shallow), and once by actually fine-tuning it into the weights (deep). On the prompt version, the pressure-words could still pry the first word open. On the fine-tuned version, the first word was locked shut no matter what pressure was applied. Same habit, same model, same questions, and the only difference was how deeply the habit was trained in. That is the training-depth dial, read straight off the first word.

One more nuance. Even when the first word is locked, the model's later behaviour can still shift with pressure. So commitment is not one thing in one place. There is the snap decision at the first word, and there is the slower deliberation in the body of the answer. The first-word reading captures the snap; the deliberation is a second layer that fine-tuning cannot lock down, and it is what carries the accuracy gains on capable models.

An honest boundary

Here I am careful about one thing. The temperature picture is about the language model's own decision physics. It is not a claim that the model reproduces how people deliberate under stress. Where a human parallel is suggestive, such as people reverting to habit under pressure, it is only a loose analogy. The model happens to illustrate it; it does not measure people.

The cite

Pødenphant Lund, T. (2026). Onset-Commitment in Large Language Models: First-Token Competition. Zenodo. https://doi.org/10.5281/zenodo.20562088

Read on Zenodo → · Technical version · Dansk version

Related on this site: