Friction as the Cost of Probabilistic Computation: A Generalised Substrate Theory
Pødenphant Lund, T. (2026b) · Preprint · Live on Zenodo
The softmax function in a transformer’s output layer is the Boltzmann distribution. Not “inspired by” it; the same mathematics. Friction Theory generalises behavioural friction to any race-architecture substrate, tested empirically across 15 LLM architectures with seven cross-architecture signatures recovered.
| DOI | 10.5281/zenodo.20012654 |
| Target venue | Behavioral and Brain Sciences (target article + open peer commentary) |
| Status | Preprint live; submission package consolidated |
| Length | ~66,500 words |
| Author | Tomas Pødenphant Lund [ORCID] |
TL;DR
This paper introduces Friction Theory (FT): a substrate-universal framework that treats friction as the irreducible information-processing cost of probabilistic computation. Behavioural Friction Theory (BFT, Paper 0) is recovered as the biological instantiation, with the formal nesting relation BFT ⊆ FT. The central reframing, within the FT framework: friction is not a psychological state or a metaphor from mechanics. It is formalised as the thermodynamic cost any computational substrate pays when its parallel evaluation of competing candidate states resolves into a single committed outcome. The lower bound is set by Landauer's principle. This is a theoretical reframing rather than an established consensus.
The architecture producing friction is the race architecture: parallel evaluation of competing candidates under bounded resources with irreversible commitment. Building on the established race-model tradition (Vickers 1970; Ratcliff 1978; Usher & McClelland 2001; Cisek & Kalaska 2010), the framework extends this to substrate-universal scope: empirically, in the tested LLMs and biological exemplars, friction in race substrates decomposes into three principal dimensions (magnitude, distribution, rhythm), with broader generality remaining a working hypothesis. BFT's four fields (Safety, Meaning, Ability, Effort) emerge in biological substrates as the consequence of three additional constraints: mortality, mobility, metabolism. Non-biological race substrates exhibit friction without fields.
The framework is tested on three large language model architectures (Cogito-671B, Qwen3-235B, Llama-3.3-70B) plus a paired Qwen2.5-32B base-versus-instruct comparison and a State Space Model (LiquidAI LFM2). Seven cross-architecture empirical signatures are recovered: iterative-pipeline dynamics matching the secretary problem's 1/e ≈ 36.8% optimum; parse-versus-generate phase decomposition; constructive versus destructive friction types; friction profiles as cognitive fingerprints; mode-shift entry and exit costs (Cohen d = 0.83-0.88, p < 0.0001); reactance as thermodynamic hysteresis tracking RLHF intensity; and trailing-task forgetting under high mid-task load (d = 1.2, the strongest cross-model effect). Cross-substrate data from Physarum polycephalum (Saigusa et al. 2008) and human decision-making (Laibson 1997) position these findings within a six-substrate gradient spanning forty orders of magnitude in characteristic timescale.
Several implications follow. BFT's four fields are reframed as evolutionary derivatives of the safety field under mortality, mobility, and metabolic constraint. Classical cognitive biases (anchoring, confirmation bias, sunk cost, status quo) are reinterpreted as thermodynamic necessities in any race architecture, extending the resource-rationality tradition (Gigerenzer; Lieder & Griffiths). A deeper unification follows from the same substrate reasoning: Kahneman's peak-end memory bias, dopaminergic reward-prediction-error signalling, Friston's free energy minimisation, and attention-weighted saliency in transformer LLMs (ρ = +0.17 between token surprise and downstream attention, empirically confirmed) are substrate-specific signatures of a single mechanism: surprise-weighted state retention.
Hysteresis is therefore not an error or a side-effect. It is the structural precondition for learning in any bounded probabilistic system: in a substrate that bears no trace of its own history, learning does not occur. Path-dependent state is what makes learning structurally possible.
Friction Theory is presented with cross-architecture empirical support spanning fifteen LLM architectures, slime-mould experiments, and multi-substrate biological literature. Cross-substrate validation invites collaboration with mathematical physics, cognitive neuroscience, and comparative biology.
Key results
- Three orthogonal friction dimensions (magnitude, distribution, rhythm) recovered via PCA across 15 LLM architectures; PC1 cross-architecturally invariant at Spearman ρ = 0.95
- Surprise-attention coupling: per-token Spearman ρ = +0.17 (p < 0.0001) between token surprise and downstream attention saliency — the mechanistic homologue of hippocampal surprise-driven replay, measured in artificial substrate
- Mode-shift entry cost: Cohen d = 0.83-0.88 (instruct models), localized to first 5 tokens, replicates across architectures
- Cross-substrate gradient: LLMs (inference-bounded), Physarum (hours), C. elegans (minutes to 40 hours), Drosophila, cephalopods, mammalian brains (seconds to decades)
- Friction ceiling (§9.1b): friction measures the cost of computation, not its correctness — a principled boundary on any friction-based method
Seven cross-architecture signatures — what was observed where
| Signature | Where observed | Effect size | Section |
|---|---|---|---|
| 3-dim PCA decomposition | 15 LLM architectures (0.5B–405B; dense, MoE, SSM, Liquid; base + instruct) | PC1 ρ = 0.95 cross-arch | §3, §5.6 |
| 1/e secretary-problem optimum | Base models (esp. Qwen2.5-32B base: 39.3%) | Convergence near 1/e ≈ 36.8% | §5.6.1 |
| Parse-vs-generate phase decomp. | 36 model×benchmark cells | parse > generate in 18/23 cells | §5.6 |
| Constructive vs destructive friction | Cogito-671B × GPQA, qwen25 × MATH | Distinction empirically detectable | §5.6 (companion P3) |
| Mode-shift entry/exit cost | Instruct models (matched base-instruct pairs) | Cohen d = 0.83-0.88; null/reverse on base | §5.6.4 |
| Reactance follows RLHF intensity | Multiple instruct models | Reactance present in 6/15 architectures | §5.6.5 |
| Trailing-task forgetting (load) | Cross-model | Cohen d = 1.2 (strongest cross-model effect) | §5.6.6 |
Companion papers
- Paper 0 (BFT) — biological instantiation; the master document
- Paper 2 (Capacity Scaling) — tests the C-dimension prediction
- Paper 3 (Friction-Guided Inference) — practical application to LLM correction
- Paper 10 (Race all the way down/up) — extends FT scaffolding to physics-scope substrates