We measured the Dunning-Kruger effect inside a language model

Paper 21 · Pødenphant Lund (2026) · Read on Zenodo

We measured the Dunning-Kruger effect directly inside a language model.The Dunning-Kruger effect is the observation that people are often most confident when they know the least, and it has been argued about for decades because nobody can see the thing that actually drives it. In a person, confidence is something you have to guess at from what they say and how well they do. In a language model you can look straight inside and watch how strongly the answers are competing before it commits. So we did. And the famous curve falls right out.

The thing you can never see in a person

The Dunning-Kruger curve has four landmarks. A beginner starts out appropriately unsure. Then comes a steep climb to a peak of confident cluelessness, often nicknamed "Mount Stupid." Then a dip, "the valley of despair," as the learner starts to see how much they were missing. Then a slow climb back as real skill catches up.

The whole debate is about one hidden quantity: how hard a person's competing answers are fighting it out inside their head before they pick one. You can never watch that directly. You reconstruct it from confidence ratings and test scores, and critics have shown that reconstruction can manufacture the curve on its own, through statistical quirks, before any real overconfidence enters the picture.

So we changed where we looked

A language model picks each word by running a kind of race between candidates, and you can read the scoreboard. For every answer it gives, you can see how many options were live and how far ahead the winner was. That is the quantity nobody can read in a human brain, sitting right there in the numbers.

Before using it, we checked it measures what we think it measures. The gap between the top answer and the runner-up (we call it the balance of evidence) predicts whether the model is right, beats the simpler measure the field has been using, and even tells correct from incorrect answers apart when that simpler measure says they look the same. It works on knowledge questions and on a visual "is this mostly red or green?" task too. It is reading a real decision variable, not a fluke.

Then we built a Mount Stupid on purpose

To get the curve you need a place where a confident-but-wrong belief forms and then gets corrected. Picture a teacher with a hidden grading rule: a pupil's grade equals their number on the class list, except every pupil whose number is a multiple of five gets 100 added, so pupil 10 scores 110, not 10. Show the model only the easy cases (1→1, 2→2, up to 9→9) and never a multiple of five. It does exactly what a person does: it spots "number = grade" and becomes sure of it. Ask what pupil 10 scores and back comes "10," confidently, sure and wrong. That is the peak: one answer won easily because nothing was competing with it yet, and an easy win feels like confidence.

Then the truth arrives a little at a time ("actually, pupil 5 scored 105"). A second answer gets laid down, and being contradicted makes it land harder, so now two answers compete and the easy win shrinks. That is the valley. Keep going and the correct answer eventually wins cleanly. That is the climb back up.

What we found

Three parts of the curve showed up with no biology needed at all. They come straight from the way learning competes:

The confident climb to Mount Stupid. As the half-learned rule forms, the model's confidence races ahead of its actual competence. It is sure, and wrong, in exactly the places the rule hides a surprise. On the easy cases where it really is competent, there is no such gap.
The moment of recognition. At the instant the contradiction lands, the two answers drop to nearly dead even inside the model. That is the doubt of the valley, visible in the numbers.
The climb back. With enough examples the correct answer wins. Bigger models climb back faster; the smallest one never recovers and stays confidently wrong.

Two other parts behaved differently, and the difference is the interesting bit. They were not really missing. They were hidden by how the model is normally made to answer:

The beginner's humility. Forced to give a number, the model looks born-overconfident. But the moment we let it say "I'm not sure," it said exactly that, every single time, whenever it was genuinely ignorant. The humility was there all along. We had simply been gagging it. (People do the same: let them choose what to answer instead of forcing it, and what they do say gets more accurate.)
The valley of despair. In its spoken answer the model showed no dip. But inside, at the moment of contradiction, the winning and losing answers sat almost equally balanced. The standard way of reading off "the single most likely answer" throws that near-tie away, which is why the dip seemed to vanish. Read the margin instead, and the valley is unmistakable.

Why this matters

This flips the usual direction of friction theory. Paper 1 used the theory to explain how a mind makes a decision. Here we use a model's readable numbers as a measuring instrument for the kind of bounded decision-making that people do too. Confidence tracks how the competing answers resolve, not how good you actually are. Becoming wiser is nothing more mysterious than going from one answer that wins too easily to several answers that compete until the right one wins. Dunning-Kruger is not a strange human flaw; it is the shape of any learning that starts out too simple.

It also sorts the curve into two piles. The confident climb, the recognition, and the recovery are properties of the decision machinery itself. No neurons required. The felt humility and the felt despair are the part that the biological brain seems to add on top. That is a more useful answer than "models do or don't show Dunning-Kruger": it says exactly which pieces are mechanical and which are human.

The cite

Pødenphant Lund, T. (2026). Mount Stupid in the machine: how evidence competition explains the Dunning-Kruger curve in a language model. Zenodo. https://doi.org/10.5281/zenodo.20562415

Read on Zenodo → · Technical version · Dansk version

Related on this site:

The Dunning-Kruger effect — a plain-English walkthrough of the phenomenon, start to finish.
What language models reveal about minds — the bigger picture of reading cognition off a model you can see inside.
Paper 1 (Friction Theory) — the framework whose direction this paper reverses.
Paper 0 (BFT) — the biological version of the same machinery.