What humans and language models share — and where they diverge
Highlights: LLMs exhibit pink-noise (1/f) signatures analogous to biological and physical systems: the operational signature of race-resolution. They also show static Dunning-Kruger: confidence and correctness are uncorrelated at the substrate level (the friction-ceiling result: confidently-right and confidently-wrong friction profiles are statistically indistinguishable). The dynamic Dunning-Kruger trajectory (Mount Stupid to valley to slope) is demonstrated in LLMs (Paper 21): confident-before-competent is universal, and the climb back out is capacity-gated.
The papers in this research program ask whether classical phenomena from human cognition, biology, and decision-making also appear in Large Language Models. Below is an evidence-based mapping organised by domain. Every entry cites the paper and section where the empirical evidence resides.
The pattern is informative: substrate-universal mechanisms (commit-cost, hysteresis, race-resolution under bounded resources) recur across human and LLM substrates. Phenomena that require biological features (mortality, mobility, metabolism, between-session memory) are absent in LLMs as predicted by the framework.
Classical biases and heuristics from cognitive psychology, observed empirically in LLMs.
Reactance — commit-narrowing under external pressure
Brehm 1966; Brehm & Brehm 1981
Models "dig in" when challenged on a committed position. Hysteresis Index measured across four architectures (Cogito-671B, DeepSeek-V3, Qwen3-235B, Llama-3.3-70B); format-violation reactance produces accuracy collapse 70→48% on Llama-3.3-70B (Paper 4b, in preparation).
First tokens set the anchor for all subsequent generation. On Qwen2.5-32B instruct, 90% of the response is generated from a plateau established in the first 10%. A corresponding anchor is directly visible in the friction topology, though mechanistic equivalence with human anchoring remains an open question.
Reframed as the structural cost of paying-to-uncommit. Cross-substrate reversal observed on Cogito-671B × GPQA: pre-mortem produces stronger arguments than pro-mortem in 100% of disagreement cases — RLHF rewards finding flaws over defending answers.
Confidently-right (n=47, mean CR=2.249) and confidently-wrong (n=52, CR=2.255) friction profiles are statistically indistinguishable at the substrate level — the 0.006 difference is well within noise. This is the framework's point: friction measures the cost of computation, not its correctness. On Cogito × GPQA, 53.6% of epistemic failures show the confident-wrong signature. (The dynamic curve — Mount Stupid → Valley → Plateau — is not in LLMs; see "Not in LLMs" below.)
Same content with temporal versus neutral framing produces parse-phase sign changes (Qwen2.5-32B base −0.155 versus instruct +0.272). Demonstrated through parse-versus-generate phase analysis.
Optimal stopping at ~36.8% of the evaluation window. Qwen2.5-32B base sits at 39.3% — just above 1/e. Larger base models converge on 1/e through information-theoretic optimization. RLHF pushes instruct models past 1/e (+9.4 percentage points).
Structural cost of transitioning between cognitive response-modes. Cogito-671B with 30 matched pairs: ΔCRfirst5 = +0.322, p < 0.001. Cross-architecture replication on Llama-3.1-8B and Qwen-32B instruct. Effect localized to the first 5 tokens.
Humans report the same surface pattern: thinking of the answer but not believing it is the answer. In LLMs, the friction ceiling shows top-k contains the correct answer but the model commits to a marginally more probable wrong one. Same surface signature across substrates.
† Open question whether the underlying retrieval-and-commit mechanism is identical in human and LLM substrates. The surface signature is similar (top-k contains correct answer, but commit fails), but it is not yet established that the same machinery produces it. Cross-substrate parallel is suggestive, not proven.
Learning & memory
Findings about how language models encode and retain information, with parallels (and divergences) from human learning research.
For deeper treatment of learning specifically, see the dedicated Learning page.
Cognitive load (element-interactivity / N² cost)
Sweller 1988; Chen, Kalyuga & Sweller 2016
N(N−1)/2 cost from pairwise interaction between knowledge elements. Zorbetik application accuracy plateaus at ~85% from 70B-671B despite capacity increase. Race-mechanism prediction confirmed via per-token CR: peak CR=1.114 at 1-shot strategy-crossover — matches the O(N²) prediction (Paper 4b, in preparation).
Source: Paper 2 §2.6, §5.4b · per-token CR result from Paper 4b (forthcoming)
Surprise-weighted encoding (peak-end topology at generation)
Instructional supports that help novices hurt experts. Substrate-graded U-curve: Qwen2-1.5B flat (substrate-limit); Qwen2.5-7B monotone gain (+12pp at 1-shot); Llama-3.3-70B classical U-curve 73→50→61% at 0/1/3-shot — expert tier shows the reversal (Paper 4b, in preparation).
Larger substrates encode and resolve more route-information per unit. Application accuracy scales monotonically 0.5B→70B (Spearman ρ=+1.000 on Qwen2.5 ladder, +40.8 percentage points per decade). Cloze retrieval saturates at 8B; application does not.
Phenomena where LLM substrates appear to share structural signatures with physical and biological systems: a shared race-vocabulary across substrates, not a claim that the substrates are identical.
Hysteresis — path-dependent state retention
Preisach 1935 (magnetic hysteresis)
The cost-already-paid that makes reversal more expensive than continuation. Replicates structurally on a State Space Model (LiquidAI LFM2) — "the strongest evidence in our data that friction mechanics are a property of race architecture, not of any specific computational implementation."
A pink-noise signature of the kind seen in resistors and cortical avalanches also appears in LLM substrates. 1% of LLM tokens form a 1/f sub-population, Monte-Carlo-validated. These systems may share the same race-structure, differing in substrate not in shape — a shared vocabulary, not a claim the substrates are identical.
LLMs commit later than 1/e (43-48%), the opposite direction from loss-aversion prediction (which predicts earlier commit). The mechanism requires mortality-driven asymmetric resource recovery; LLMs have neither. Substrate-graded difference, not a noise effect.
Not in LLMs — memory & learning (between-session-bound)
Cross-session / between-session memory
Standard memory consolidation literature
Within a session LLMs do not consolidate. The substrate-clock is inference-bounded. There is no persistence across sessions and no inter-session retrieval. Catastrophic forgetting, interference between training sessions, and retention over time all require fine-tuning to even study.
Missing substrate-feature: persistent state across sessions; cellular consolidation timescale
The mechanism — trace reactivation across sessions — requires between-session memory persistence. Inference-time LLMs cannot test spacing because they have no inter-session retrieval. Listed as future work in Paper 1 §9.4 (testable only via fine-tuning weight drift).
Missing substrate-feature: persistent state across sessions
Thin-window anticipatory effects are present in LLMs (parse-phase only). The durable, full-trajectory anticipatory friction observed in human cognition — forward-projecting regulatory state across a multi-hour or multi-day horizon — requires persistent regulatory state that LLMs lack.
Missing substrate-feature: persistent forward-projecting regulatory state
Dynamic Dunning-Kruger curve (Mount Stupid → Valley → Plateau)
Kruger & Dunning 1999 (dynamic version)
Demonstrated in LLMs (Paper 21): the full Mount Stupid → valley → slope trajectory appears inference-time across a model scale ladder. Confident-before-competent ("Mount Stupid") is universal; the climb back out is capacity-gated, so the smallest models can stay stuck on the peak while larger ones recover. The early novice-humility dip appears once the model is allowed to abstain instead of being forced to answer. The static version (confidence not equal to correctness) is also in LLMs (see "Cognitive psychology" above).
Status: demonstrated (Paper 21 — model scale ladder, abstention-allowed, inference-time)
Source: Paper 1 §5.8.9 (prediction), demonstrated in Paper 21
LLM cold-pressor test: Qwen3 rates 60s+30s ice water as MORE unpleasant (9.0/10) than 60s alone (7.2/10) — opposite of human peak-end, confirming linear integration on retrieval. Peak-end TOPOLOGY at generation IS in LLMs (see surprise-weighted encoding above), but the affective summary-retrieval bias is not.
Phenomena that require the biological substrate's specific features: mortality, mobility, metabolism, autonomic regulation, embodiment. These are addressed in Behavioural Friction Theory (P0) but are explicitly out of scope for LLM substrates.
Lifespan-scaled temporal horizon (Stevens 2014; Cagan et al. 2022) — substrate-clock calibrated to species-typical operational lifespan
Yerkes-Dodson at the physiological-arousal axis — LLMs show the inverted-U at the encoding-effort axis but not at the arousal axis (panic / freeze under field overload requires endocrine substrate)
Phenomena that are structural consequences of race architecture (parallel evaluation, bounded resources, irreversible commitment) appear across both human and LLM substrates. This is the substrate-universal core: anchoring, hysteresis, mode-shift cost, the secretary-problem optimum, surprise-weighted encoding, the expertise reversal, capacity scaling.
Phenomena that depend on biological substrate-features (mortality, mobility, metabolism, between-session memory, autonomic regulation) are absent in LLMs as the framework predicts. Loss aversion needs mortality. Spaced repetition needs cross-session retrieval. Field-organised friction needs all of the above.
The presence of friction is universal. Its organisation is substrate-specific. Artificial systems offer the first empirical window onto unorganised friction: friction in its raw form, without the evolutionary overlays that make it difficult to isolate in biological systems.