Why "knows little, believes a lot"

It shows up in language models too, and that reveals what it really is

The Dunning-Kruger effect shows up in a language model such as ChatGPT too. And it is stranger than it sounds. A machine is not vain. It does not kid itself that it is cleverer than it is. Yet it still grows most certain exactly when it knows least. So when the effect shows up there as well, it cannot be about a fragile human ego. It must be something more basic. And because we can look straight inside the model, we can now see exactly why it happens.

A teacher with a secret grading system

Imagine a teacher with a secret grading system. The rule is simple: your grade is your number on the class list. Student no. 3 gets 3, student no. 7 gets 7. But there is a hidden exception: every student whose number is divisible by 5 (5, 10, 15, 20, and so on) gets 100 added. So student no. 10 does not get 10, but 110.

Now we show a learner only the easy examples: student 1 got 1, student 2 got 2, and so on up to 9, but never one of the special ones. What happens? Exactly what a person would do: you see the pattern "number = grade" and become convinced. Ask "what does student no. 10 get?" and the answer comes promptly and confidently: "10." Sure and wrong at the same time. That is the top of the mountain: one explanation has won too easily, because nothing has contradicted it yet, and an easy win feels like certainty.

Then the truth starts to drip in: "Student no. 5 actually got 105." Now a new track is laid beside the old one, and being contradicted makes the new track especially strong. Suddenly two explanations compete, and the easy win shrinks. Certainty drops. That is the valley.

Keep giving examples and the right explanation eventually wins outright, and you start answering 110 for student 10. That is the climb back up.

The example is not an analogy we made up. The model was actually shown pairs like "student no. 7 → grade 7" with exactly that rule (number = grade, except multiples of 5, which get 100 added). The only thing we dressed up is saying "students in a class" instead of dry index numbers.

Four graphs, one per language model (Llama-3-8B, Qwen2.5-7B, Llama-3.3-70B, Qwen3-235B). In each, the black confidence curve rises quickly to the top while the green curve for how correct the answer actually is lags behind. In the small model the confidence stays stuck at the top; the larger models show the correct understanding eventually catching up and the curve recovering. — This is not a drawing of the idea, but measured directly in four real language models as they learn the system above. The black curve is how confident the model sounds; the green is how right the answer actually is. "Confident before competent" is there in all of them. The small model (top left) gets stuck on the top of the mountain and never moves on; the larger the model, the clearer the trip down into the valley and back up. Climbing back out, in other words, takes size.

Two things worth noticing

First: at the start the ignorant one only sounded certain because we forced it to answer with a number. The moment we allowed it to say "I don't know," it said exactly that, every time it was genuinely ignorant. The humility was there all along. We had just shut its mouth.

Second: even when the answer sounded certain, we could look underneath and see that "10" and "110" stood almost equally strong right at the moment of recognition. The doubt was real. It just was not said out loud.

What it means

Getting smarter is nothing mysterious. It is simply going from one track that wins too easily to several tracks that compete, until the right one wins. Dunning-Kruger is not a strange human flaw. It is the shape of any learning that starts too simple.

For the underlying picture, What is a race? starts with water: tanks, pipes, and the channels in the sand that learning leaves behind. And Learning explains why being contradicted digs the new track so deep.