Mount Stupid in the machine: how evidence competition explains the Dunning-Kruger curve in a language model

Paper 21 · Pødenphant Lund (2026) · Read on Zenodo

The load-bearing quantity behind the Dunning-Kruger pattern — how strongly a person's competing answers compete before they commit — is never observed directly in people; it is reconstructed from confidence and accuracy scores entangled with regression-to-the-mean and scale-use. We change substrate. In a language model the same quantity is read straight off the next-token probability distribution, validated as a proxy for the sequential-sampling decision variable, then used as a model organism for the Dunning-Kruger curve.

DOI (concept)10.5281/zenodo.20562415
Published2026-06-16 · live on Zenodo
AuthorTomas Pødenphant Lund [ORCID]

TL;DR

The Dunning-Kruger pattern — early overconfidence that outruns competence, a dip as the gap is recognised, then recalibration — is contested largely because of a measurement problem: the quantity that does the work, the competition between candidate answers before commitment, is never observed in people, only reconstructed. A large language model is a bounded-decision system in which that quantity is directly readable off the answer-token distribution.

We first validate the reading as a proxy for the latent decision variable that drift-diffusion, race, and leaky-competing-accumulator models use to describe bounded human choice. We then use the validated proxy as a model organism for the Dunning-Kruger curve and find a partial reproduction with principled divergences: the confident-before-competent rise, the recognition near-tie, and the recalibration slope are architectural; novice humility and the affective valley are present in the substrate but masked in behaviour by forced commitment and argmax decoding.

This inverts Paper 1's arrow. Paper 1 used friction theory to explain how a substrate decides; here the substrate's readable logprobs are used as a measurement model for bounded-decision cognition. Confidence is shown to track how the competing routes resolve, not competence.

The move: read the variable, don't infer it

The Dunning-Kruger effect names a pattern between competence and self-assessment: people with the least skill often over-estimate it the most, and calibration improves with expertise (Kruger & Dunning, 1999). The popularised curve has four landmarks — a brief novice phase of appropriate uncertainty, a steep rise to a peak of confident incompetence ("Mount Stupid"), a dip as the learner recognises how much they do not know ("the valley of despair"), and a slope of recalibration. Critics have argued the canonical curve can be produced by artefacts of reconstruction: regression to the mean, better-than-average scale-use, and noise in the metacognitive readout (Krueger & Mueller, 2002; Gignac & Zajenkowski, 2020).

This paper takes the measurement problem head on by changing substrate. At each token the model commits to one route out of many; the next-token probability distribution exposes, per decision, how many routes are live and how sharply one wins. In humans every estimate of the latent decision variable is a model-based inference from behaviour; in an LLM it can be read.

Step 1 — a validated proxy for the latent decision variable

On a format-matched multiple-choice paradigm (GPQA-Diamond, MMLU-Pro), each option token carries a log-probability, so the option log-probabilities are a literal race over the response options. Per item we read the balance of evidence (BoE) — the gap between the highest- and second-highest-probability option, the leaky-competing-accumulator difference between the winning and runner-up accumulator — and the chosen-token surprisal, the scalar the surprisal-to-reading-time tradition uses. Per the calibration rule, only mid-accuracy cells (30–70%) were retained; five model×benchmark cells across gpt-4o-mini, Qwen2.5-7B, and Llama-3.3-70B qualified (pooled n = 749).

Step 2 — the model organism: a partial Dunning-Kruger curve

The hidden-structure paradigm manufactures confident error. The rule is grade = index, EXCEPT multiples of 5 → grade = index + 100. The model sees only simple-rule examples (1→1 … 9→9) in context, so it confidently infers "grade = index" and applies it to unseen multiples of 5 — confidently wrong, Mount Stupid. Exceptions are then revealed one family at a time and a held-out probe set is re-measured at each stage, split into followers (genuine competence, a control) and hidden exceptions (the gap). Confidence is read from the first answer token: top-1 probability, top-two margin, and onset competition (how many option tokens carry probability ≥ 0.10 at commitment). Because this design reads confidence and competence on the same fixed held-out items, it is immune to the regression-to-the-mean critique that the human curve cannot escape.

Architectural — transfers, no biology required

Readout-gated — absent from behaviour, present in the substrate

What the model organism buys

The substrate's transparency separates the Dunning-Kruger pattern into an architectural core and a biological overlay, and says where the seam is. The two phases that diverge — novice humility and the affective valley — are exactly the affective and metacognitive overlay of the human curve. They do not transfer to LLM behaviour because forced commitment, argmax decoding, and alignment over-commitment mask them, yet the substrate carries both. That is a more useful statement than "LLMs reproduce / fail to reproduce Dunning-Kruger": it tells a cognitive scientist which components are properties of bounded competing-route computation and which require the biological substrate, with a falsifiable apparatus manipulation that toggles each divergent phase on and off.

The metacognitive overlay as a confidence-modulator

The confidence readout has two levels. The competing-routes margin is a content-level readout, genuinely overconfident at novice because the contradicting answers are not yet encoded to compete. Sitting above it is a meta-level signal — a calibrated "have I sampled enough to commit?" — which can be calibrated even when the content-routes are not. This maps onto the separable first-order / second-order structure already operationalised in people by the meta-d′ framework (Maniscalco & Lau, 2012; Fleming & Daw, 2017). H1's near-100% novice abstention is an empirical handle on that meta-level calibration.

A worked application: self-efficacy as a capacity-gated readout

Self-efficacy — Bandura's (1977) belief in one's capability to carry out a task, the will-I-be-able-to judgement made before acting — is, on this account, a capacity-gated readout of the same pre-commitment race. The pre-commitment competition both subtracts from solving (more competition predicts lower accuracy) and is the material of self-assessment (the same margin drives calibrated abstention). In a transparent substrate the degrader and the reporter are the same per-item friction. They decouple under capacity: the degrading leg is substrate-cheap (friction predicts errors even in small models), but reading one's own race well enough to self-assess is capacity-gated, so a small model keeps the degrader and loses the reporter.

Limits and the human-study lever

This is a model-side result, not new human data. The convergences and divergences are claims about where the analogy holds, not measurements of human metacognition. Two load-bearing predictions are already consistent with existing human findings (free-versus-forced report, Koriat & Goldsmith, 1996; the first-order/second-order split, Maniscalco & Lau, 2012; Fleming & Daw, 2017). The reading is called a validated proxy, not an instrument, until human-aligned reliability is shown; three of the four cartoon landmarks reproduce; the "alignment over-commits" claim is scoped to the Qwen and Mistral families. The decisive future experiment is a preregistered human abstention study, run under conditions matched to the model run.

Connections to other papers in the series

Read the paper

The full paper is on Zenodo (concept DOI 10.5281/zenodo.20562415):

Pødenphant Lund, T. (2026). Mount Stupid in the machine: how evidence competition explains the Dunning-Kruger curve in a language model. Zenodo. https://doi.org/10.5281/zenodo.20562415

Read on Zenodo → · Plain English version · Dansk version

Related on this site: