Friction as the Cost of Probabilistic Computation

Paper 1 · Pødenphant Lund (2026b) · Read on Zenodo

A slime mould has no brain, no nerves, not a single cell that looks like a brain cell. Saigusa and colleagues showed in 2008 that it can still learn to expect something: expose it to cold at regular intervals, and it starts to slow down before the cold arrives, as if it is counting on it. The same underlying pattern, that choosing between options costs something, turns up in slime mould, in brains, and in language models. This paper builds the formal account and tests it on 15 different language models. Seven matching signatures show up across all of them.

What it is about

The cost of choosing is not biological. It is mathematical. Resolving competing candidates costs something in any system that picks between alternatives under finite resources, not just in brains. That makes behavioural friction a special case of something more general. Behavioural Friction Theory was about biological systems; here it is lifted into a broader framework, Friction Theory (FT), where BFT is the special case: BFT ⊂ FT.

Why bother generalising? Because the underlying principle, that resolving competing candidates costs something, is not specifically biological. It is mathematical. It holds for neural networks. It holds for chemical kinetics. It may hold all the way down to quantum measurement (that is what Paper 10 investigates). If you have a race architecture, you have friction. If you have friction, you have a measurable cost. And from that cost a great many predictions follow.

The formal foundation

Friction is formally connected to thermodynamic free energy through Ortega & Braun's (2013) bounded-rational decision-making framework. This is not an analogy or a metaphor. It is the same mathematics as statistical mechanics.

For language models this connection is especially precise. The softmax function in a transformer's output layer is not "inspired by" the Boltzmann distribution. It is the Boltzmann distribution. The temperature parameter in sampling is not "similar to" temperature in physics. It is the same parameter. Token choice in auto-regressive language models is bounded-rational decision-making in Ortega & Braun's sense, exactly. The mathematical inheritance is direct.

That gives us a measurable quantity: Competing Routes (CR). CR counts how many candidate tokens were within reach at each position in the model's output. High CR means the model was weighing many alternatives. Low CR means the model was committed to one. CR comes for free from any language model's API when you ask for logprobs=True. It correlates with model errors. It changes systematically across architectures. It is the operational handle that makes the whole framework empirically testable.

threshold winner losing routes (suppressed) Time Evidence / activation COMMIT → becomes observable action
The race architecture. Several candidate routes accumulate evidence in parallel under finite bandwidth. The first to cross the commit threshold wins and becomes observable behaviour; the rest are suppressed at a cost. The same architecture instantiated across slime mould, brains, and transformers, with CR as the operational measure of how many routes were still in competition at the moment of commit.

Empirical test: 15 architectures, seven signatures

The theory has been tested empirically on 15 different language-model architectures ranging from 0.5B to 405B parameters: dense transformers, mixture-of-experts, State Space Models, Liquid Neural Networks, base models, instruction-tuned models. Seven cross-architecture signatures were found:

Three friction dimensions, found everywhere

Principal Components Analysis across all 15 architectures shows that friction has exactly three independent dimensions: magnitude, distribution, and rhythm.

The first dimension (magnitude) is practically identical across all architectures: Spearman's ρ = 0.95 cross-architecturally. That is a striking finding. It means the three-axis decomposition is not a property of any specific model or any specific training procedure. It is a property of the race architecture itself. The same architecture, instantiated in 15 different ways, produces the same three-axis decomposition.

BFT is a subset of FT

The relationship between the two papers is precise: BFT ⊂ FT. BFT's four fields (Safety, Meaning, Competence, Effort) arise when three further biological constraints are added: mortality, mobility, metabolism. Non-biological race systems exhibit friction without fields. The presence of friction is universal across substrates; its organisation into four behavioural fields is specifically biological.

This is testable. Language models, which have none of the three biological constraints, show friction (measurable as CR) but not field-organised friction. The cross-architecture data are consistent with this prediction across all 15 architectures studied.

How far does it reach?

Cross-substrate data from slime mould (Saigusa et al. 2008's anticipatory conditioning), C. elegans, flies, octopuses, and human brains place language models in a six-substrate gradient. Same architecture, varying substrates, similar phenomena. How far the theory reaches (whether it extends down to quantum systems and up to economic markets) is an open empirical hypothesis the paper does not settle. Paper 10 tests the physics-downward direction explicitly.

What this paper enables

FT is the theoretical anchor the other papers build on:

You will find the full technical detail in the English version: Paper 1 (English technical). The full paper is on Zenodo: DOI 10.5281/zenodo.20012654.