Cross-Substrate Phenomena

What humans and language models share — and where they diverge

Highlights: LLMs exhibit pink-noise (1/f) signatures analogous to biological and physical systems: the operational signature of race-resolution. They also show static Dunning-Kruger: confidence and correctness are uncorrelated at the substrate level (the friction-ceiling result: confidently-right and confidently-wrong friction profiles are statistically indistinguishable). The dynamic Dunning-Kruger trajectory (Mount Stupid to valley to slope) is demonstrated in LLMs (Paper 21): confident-before-competent is universal, and the climb back out is capacity-gated.

The papers in this research program ask whether classical phenomena from human cognition, biology, and decision-making also appear in Large Language Models. Below is an evidence-based mapping organised by domain. Every entry cites the paper and section where the empirical evidence resides.

The pattern is informative: substrate-universal mechanisms (commit-cost, hysteresis, race-resolution under bounded resources) recur across human and LLM substrates. Phenomena that require biological features (mortality, mobility, metabolism, between-session memory) are absent in LLMs as predicted by the framework.

On this page

Confirmed in language models:

Not in language models — and why:

Biological-specific:


Cognitive psychology & decision-making

Classical biases and heuristics from cognitive psychology, observed empirically in LLMs.

Reactance — commit-narrowing under external pressure
Brehm 1966; Brehm & Brehm 1981
Models "dig in" when challenged on a committed position. Hysteresis Index measured across four architectures (Cogito-671B, DeepSeek-V3, Qwen3-235B, Llama-3.3-70B); format-violation reactance produces accuracy collapse 70→48% on Llama-3.3-70B (Paper 4b, in preparation).
Source: Paper 1 §2.3, §5.6.2, §6.1b · format-violation result from Paper 4b (forthcoming)
Anchoring — first-route starting-point advantage
Tversky & Kahneman 1974
First tokens set the anchor for all subsequent generation. On Qwen2.5-32B instruct, 90% of the response is generated from a plateau established in the first 10%. A corresponding anchor is directly visible in the friction topology, though mechanistic equivalence with human anchoring remains an open question.
Confirmation bias — exit cost of uncommitting
Nickerson 1998
Reframed as the structural cost of paying-to-uncommit. Cross-substrate reversal observed on Cogito-671B × GPQA: pre-mortem produces stronger arguments than pro-mortem in 100% of disagreement cases — RLHF rewards finding flaws over defending answers.
Dunning-Kruger (static manifestation)
Kruger & Dunning 1999
Confidently-right (n=47, mean CR=2.249) and confidently-wrong (n=52, CR=2.255) friction profiles are statistically indistinguishable at the substrate level — the 0.006 difference is well within noise. This is the framework's point: friction measures the cost of computation, not its correctness. On Cogito × GPQA, 53.6% of epistemic failures show the confident-wrong signature. (The dynamic curve — Mount Stupid → Valley → Plateau — is not in LLMs; see "Not in LLMs" below.)
Framing effects — path-dependent route activation
Tversky & Kahneman 1981
Same content with temporal versus neutral framing produces parse-phase sign changes (Qwen2.5-32B base −0.155 versus instruct +0.272). Demonstrated through parse-versus-generate phase analysis.
1/e commit-timing (secretary problem)
Lindley 1961; Gilbert & Mosteller 1966
Optimal stopping at ~36.8% of the evaluation window. Qwen2.5-32B base sits at 39.3% — just above 1/e. Larger base models converge on 1/e through information-theoretic optimization. RLHF pushes instruct models past 1/e (+9.4 percentage points).
Mode-shift cost (task-switching)
Allport, Styles & Hsieh 1994; Monsell 2003
Structural cost of transitioning between cognitive response-modes. Cogito-671B with 30 matched pairs: ΔCRfirst5 = +0.322, p < 0.001. Cross-architecture replication on Llama-3.1-8B and Qwen-32B instruct. Effect localized to the first 5 tokens.
Moses illusion — retrieval succeeds, commit fails
Erickson & Mattson 1981
Humans report the same surface pattern: thinking of the answer but not believing it is the answer. In LLMs, the friction ceiling shows top-k contains the correct answer but the model commits to a marginally more probable wrong one. Same surface signature across substrates.
Open question whether the underlying retrieval-and-commit mechanism is identical in human and LLM substrates. The surface signature is similar (top-k contains correct answer, but commit fails), but it is not yet established that the same machinery produces it. Cross-substrate parallel is suggestive, not proven.

Learning & memory

Findings about how language models encode and retain information, with parallels (and divergences) from human learning research.

For deeper treatment of learning specifically, see the dedicated Learning page.

Cognitive load (element-interactivity / N² cost)
Sweller 1988; Chen, Kalyuga & Sweller 2016
N(N−1)/2 cost from pairwise interaction between knowledge elements. Zorbetik application accuracy plateaus at ~85% from 70B-671B despite capacity increase. Race-mechanism prediction confirmed via per-token CR: peak CR=1.114 at 1-shot strategy-crossover — matches the O(N²) prediction (Paper 4b, in preparation).
Source: Paper 2 §2.6, §5.4b · per-token CR result from Paper 4b (forthcoming)
Surprise-weighted encoding (peak-end topology at generation)
Rescorla & Wagner 1972; Schultz, Dayan & Montague 1997
High-surprise tokens are bifurcation points that disproportionately shape downstream computation. Qwen2.5-0.5B-Instruct, 866 generated tokens: Spearman ρ(surprise, downstream-saliency) = +0.171, p < 0.0001. Top-quartile-surprise tokens receive 1.34× more attention than bottom-quartile.
Expertise reversal effect
Kalyuga, Ayres, Chandler & Sweller 2003
Instructional supports that help novices hurt experts. Substrate-graded U-curve: Qwen2-1.5B flat (substrate-limit); Qwen2.5-7B monotone gain (+12pp at 1-shot); Llama-3.3-70B classical U-curve 73→50→61% at 0/1/3-shot — expert tier shows the reversal (Paper 4b, in preparation).
Source: Paper 1 §5.8.7 · Paper 3 §5.4 · cross-substrate U-curve from Paper 4b (forthcoming)
Capacity scaling of encoding-through-loading
Bjork 1994 desirable difficulties; Craik & Lockhart 1972
Larger substrates encode and resolve more route-information per unit. Application accuracy scales monotonically 0.5B→70B (Spearman ρ=+1.000 on Qwen2.5 ladder, +40.8 percentage points per decade). Cloze retrieval saturates at 8B; application does not.

Substrate dynamics & physics

Phenomena where LLM substrates appear to share structural signatures with physical and biological systems: a shared race-vocabulary across substrates, not a claim that the substrates are identical.

Hysteresis — path-dependent state retention
Preisach 1935 (magnetic hysteresis)
The cost-already-paid that makes reversal more expensive than continuation. Replicates structurally on a State Space Model (LiquidAI LFM2) — "the strongest evidence in our data that friction mechanics are a property of race architecture, not of any specific computational implementation."
Pink noise (1/f spectrum) from race-resolution
Hooge 1969; Bak-Tang-Wiesenfeld 1987 SOC; Beggs-Plenz 2003 (cortical avalanches)
A pink-noise signature of the kind seen in resistors and cortical avalanches also appears in LLM substrates. 1% of LLM tokens form a 1/f sub-population, Monte-Carlo-validated. These systems may share the same race-structure, differing in substrate not in shape — a shared vocabulary, not a claim the substrates are identical.

Not in LLMs — decision-making (substrate-bound)

Loss aversion
Kahneman & Tversky 1979
LLMs commit later than 1/e (43-48%), the opposite direction from loss-aversion prediction (which predicts earlier commit). The mechanism requires mortality-driven asymmetric resource recovery; LLMs have neither. Substrate-graded difference, not a noise effect.
Missing substrate-feature: mortality, asymmetric resource recovery

Not in LLMs — memory & learning (between-session-bound)

Cross-session / between-session memory
Standard memory consolidation literature
Within a session LLMs do not consolidate. The substrate-clock is inference-bounded. There is no persistence across sessions and no inter-session retrieval. Catastrophic forgetting, interference between training sessions, and retention over time all require fine-tuning to even study.
Missing substrate-feature: persistent state across sessions; cellular consolidation timescale
Spaced repetition (Ebbinghaus / Cepeda)
Ebbinghaus 1885; Cepeda et al. 2006
The mechanism — trace reactivation across sessions — requires between-session memory persistence. Inference-time LLMs cannot test spacing because they have no inter-session retrieval. Listed as future work in Paper 1 §9.4 (testable only via fine-tuning weight drift).
Missing substrate-feature: persistent state across sessions
Anticipatory friction across full trajectory
Standard prospection literature
Thin-window anticipatory effects are present in LLMs (parse-phase only). The durable, full-trajectory anticipatory friction observed in human cognition — forward-projecting regulatory state across a multi-hour or multi-day horizon — requires persistent regulatory state that LLMs lack.
Missing substrate-feature: persistent forward-projecting regulatory state

Cognitive & clinical

Dynamic Dunning-Kruger curve (Mount Stupid → Valley → Plateau)
Kruger & Dunning 1999 (dynamic version)
Demonstrated in LLMs (Paper 21): the full Mount Stupid → valley → slope trajectory appears inference-time across a model scale ladder. Confident-before-competent ("Mount Stupid") is universal; the climb back out is capacity-gated, so the smallest models can stay stuck on the peak while larger ones recover. The early novice-humility dip appears once the model is allowed to abstain instead of being forced to answer. The static version (confidence not equal to correctness) is also in LLMs (see "Cognitive psychology" above).
Status: demonstrated (Paper 21 — model scale ladder, abstention-allowed, inference-time)
Source: Paper 1 §5.8.9 (prediction), demonstrated in Paper 21
Plain-English walkthrough: Why "knows little, believes a lot"
Peak-end retrieval bias (Kahneman version)
Kahneman, Fredrickson, Schreiber & Redelmeier 1993
LLM cold-pressor test: Qwen3 rates 60s+30s ice water as MORE unpleasant (9.0/10) than 60s alone (7.2/10) — opposite of human peak-end, confirming linear integration on retrieval. Peak-end TOPOLOGY at generation IS in LLMs (see surprise-weighted encoding above), but the affective summary-retrieval bias is not.
Missing substrate-feature: affective summary-retrieval mechanism

Biological-specific (BFT territory)

Phenomena that require the biological substrate's specific features: mortality, mobility, metabolism, autonomic regulation, embodiment. These are addressed in Behavioural Friction Theory (P0) but are explicitly out of scope for LLM substrates.

Full treatment in Paper 0 (BFT).


The pattern

Phenomena that are structural consequences of race architecture (parallel evaluation, bounded resources, irreversible commitment) appear across both human and LLM substrates. This is the substrate-universal core: anchoring, hysteresis, mode-shift cost, the secretary-problem optimum, surprise-weighted encoding, the expertise reversal, capacity scaling.

Phenomena that depend on biological substrate-features (mortality, mobility, metabolism, between-session memory, autonomic regulation) are absent in LLMs as the framework predicts. Loss aversion needs mortality. Spaced repetition needs cross-session retrieval. Field-organised friction needs all of the above.

The presence of friction is universal. Its organisation is substrate-specific. Artificial systems offer the first empirical window onto unorganised friction: friction in its raw form, without the evolutionary overlays that make it difficult to isolate in biological systems.