Tomas Pødenphant Lund · Independent researcher, Aarhus
We humans learn, remember, and choose, and we so often do the opposite of what would have been smart. I study it from an unexpected angle: in language models, where the same patterns turn up and are easier to measure directly. Along the way I've also found practical ways to make those models better. The patterns reach much wider than the brain, but the human end is the easiest to recognise, so that's where we start.
The rest of this page goes deeper. If you want a one-paragraph version: every system that has to choose under finite time and resources runs an internal race between alternatives, one wins, the others are suppressed at a cost. That cost is what I call friction, and the same structure shows up at every scale that's been examined. If you want the full picture with mathematics, datasets, and citations, the technical version is the right place.
What I'm actually studying
Every thought you have, every feeling, every movement of your hand: they are all expressions of computations your body performs.
That sounds odd put that way. "Computation" sounds cold and deliberate, like a calculator running, like someone sitting inside you working out the answer. That isn't what is happening. What is actually happening is closer to a sequence of probability outcomes.
A small example. Inside one of your cells, a stretch of DNA needs to pick up one starting molecule or another. Which one it picks up depends on what is floating nearby, and in what concentration. There is no decider. There is a distribution of molecules and a probability that one lands before another. The cell then produces whichever protein follows from that landing. Multiply that across millions of events per second across your body, and you have something that "computes" continuously without anyone running the show.
The same picture scales up. When you decide between two options, you are not running a deterministic calculation. You are letting competing candidate-answers race each other in your brain under conditions of finite time and finite resources. One of them wins. You experience the winner as "I chose X." You do not experience the race.
How learning actually works
Imagine dragging your finger through a thin layer of water on a tiled bathroom floor. The water shifts; a channel forms. Drag your finger through the same path again, and it's a fraction easier, because the channel is already there. Physicists call this hysteresis: the system carries traces of its own history.
Your brain works the same way. Routes you use a lot leave traces, and the traces make those routes more probable next time. That is what learning is, at the substrate level: not magic, not a uniquely biological mystery, just probability shifted through trace-accumulation.
This is what Friction Theory is about. The price the system pays every time it has to resolve competing candidates into one committed outcome (in time, energy, information) is the friction. Friction Theory is the formal name for the framework. Behavioural Friction Theory (BFT) is its biological version, applied to nervous systems and organised around four computational fields (Safety, Meaning, Capability, Effort) and five regulatory layers.
Even language models, which are literal computers designed to absorb information, cannot be taught by being shown more of it. Paper 2B shows this directly. Throw information at a model and you get a model that hallucinates confidently about whatever you threw at it.
If language models, designed to be teachable, cannot be taught by information-dumping, why do we assume humans can be? We have been thinking about it as motivation. It is physics. You do not learn information; you learn the trace information leaves.
The full version of that argument (what it means for how we teach, and why Bjork's "desirable difficulties" is physics rather than a pedagogical choice) is on the Memory page (plain) or the Learning page (more academic).
What I do, day to day
I study language models in order to understand humans, which is the opposite direction from what most people expect. The conventional view is that language models try to mimic human language, so studying them tells you about the mimic, not about humans themselves. I am arguing the opposite.
Many of the things we thought were uniquely human turn out to appear in language models too. The reason isn't that the models were trained to imitate human cognition. Both substrates share the same underlying architecture: parallel candidates racing under finite resources, with one winning. The same architecture produces the same characteristic behaviours:
Information overload — too much context hurts, just as it hurts you
Anchoring — the first word shapes the rest
Reactance — instructions activate the routes they try to prevent
Inverted-U on challenge — too little and too much both hurt
The 37% rule — base models naturally sample at the secretary-problem optimum
→ The full tour: LLMs aren't calculators — a dedicated page on all the surprising things about language models. All the things you would expect to be typically human.
When language models reproduce these phenomena, they are not faking. They are telling us the phenomena are not specifically biological. They are structural consequences of any architecture that lets competing options race toward a deadline, with only one winner. The brain is one such system. A transformer is another. They look alike where their architecture is alike, and diverge where their architecture diverges.
What humans have that language models do not
The architecture is shared, but the implementation diverges. The places where humans and language models differ are just as informative as the places they overlap, because the divergences tell us which features are substrate-specific and which are universal.
They do not have loss aversion: the human tendency to fear losses about twice as much as we enjoy equivalent gains. This is because loss aversion fundamentally comes from the fact that you can die if you choose wrong too often. Humans, mice, and bees all have mortality; language models do not. This is a new explanation for an old finding: loss aversion is not a universal cognitive law. It is a consequence of having a body that can die.
They do get surprised: words they did not see coming draw measurably more attention, and breaking the format makes them measurably resist. Both are measured directly. The still-open question is not whether they react, but whether they can tell the two apart: whether there is a layer that distinguishes "new information I should update on" from "something I want to push back against". That layer has not yet been seen in a language model.
They do not have memory between conversations. When you start a new conversation, the language model starts over. This means a huge area of human cognition (from Ebbinghaus's forgetting curve to spaced repetition) literally cannot be tested on language models, because they lack the basic substrate for it.
The free signal — how it works
The signal I mentioned at the top is called Competing Routes. It counts how many different answers the model is seriously considering at each step. When the model is sure of itself, one answer dominates and CR is close to 1. When it is torn between alternatives, several answers have similar probability, so CR can be 3, 5, maybe 10. That number can be read directly from any language model's output at no extra cost, because it is already in the logprobs the API returns anyway.
The mechanism is simple: when CR is high (the model is uncertain), ask it to reconsider under slightly different conditions; when CR is very high, let it abstain. It sounds trivial, but it lifts performance by 12 to 21 percentage points on 5 of 5 tested model-benchmark cells, and works on any language model with a standard API. The full explanation is on the Paper 3 page.
Where humans and language models meet, and where they diverge
They meet on: anchoring, hysteresis, confirmation bias, mode-shift cost (the cost of switching between modes of thinking), 1/e secretary problem timing (the models land close to the theoretically optimal time to choose, around 37%), expertise reversal effect (instructions that help beginners hurt experts), surprise-weighted encoding, and many classical cognitive biases.
They diverge on: loss aversion (requires mortality), spaced repetition (requires between-session memory), and field-organised friction (Safety / Meaning / Capability / Effort, specific to organisms that can die, move, and consume metabolic energy).
The pattern shows up almost everywhere
Once you start looking for the race-architecture signature, you see it at scales that have nothing to do with brains or computers. The same characteristic curve (performance peaks in the middle, drops off at both ends) appears in:
Quantum particles deciding their state (the qubit decoherence window, 10−15 seconds)
Electrons drifting through metals (Ohm's law and Drude transport)
Chemical reactions choosing products (reaction kinetics)
Detectors picking up faint signals from noise (stochastic resonance)
Students learning new material (Bjork's "desirable difficulties" zone)
Whole organisms under stress (the Yerkes-Dodson curve)
Seven different things, spread across roughly forty orders of magnitude in time. They all show the same shape because they all face the same constraint: multiple options have to resolve into one outcome under finite resources. Paper 10 walks through the seven phenomena and the race-framework behind them.
The suggestive implication: biological cognition and physical systems may share the same race-structure, differing in substrate not in shape. Humans hit a performance peak in the middle of the challenge range; qubits hit one between coherence and decoherence; the proposal is that both can be read under the same race-vocabulary, with the substrate differing and the constraint shared. Behavioural patterns we usually think of as "psychological" (the inverted U, hysteresis, anchoring, mode-shift cost) may turn out to be describable in the same race-vocabulary as the physical ones. This is a shared lens, not a claim that the substrates are identical.
That is the program. The papers develop the formal apparatus, the empirical signatures, the testable predictions, and the falsification criteria.
My papers
All papers are open-access preprints on Zenodo:
The 14 active papers, clustered by domain. Foundations (P0/P1) feed the three empirical-and-applied clusters. P6 unifies LLM-empirical with biology; P10 extends the substrate scaffolding to physics-scope.
Behavioural Friction Theory (BFT) — the biological foundation
The original version of the theory, focused on biological systems. 21 testable propositions, four functional fields (Safety, Meaning, Ability, Effort), and one mechanism — the RACE model — that ties it together. Just updated to version 7.
The clinical foundation. Addiction, rumination, ADHD, PTSD and more as different settings of one machine: races, traces and pressure. The core idea is to treat at the base, not the top. A framework, not medical advice.
How many small pushes across biological scales add up to disease. Reads cancer, autoimmunity, ME/CFS and treatment-resistant depression as one shared form, and explains why combination treatment beats a single target.
Five concrete trial designs that can test compound race pathology in the clinic: treatment-resistant depression (ketamine and psilocybin with structured follow-up), long COVID, and an autoimmune CAR-T treatment. Several can run on data already collected.
The preventive side. About people who carry a vulnerable biological base without crossing a diagnostic threshold, and cofactor support matched to a measured profile. A falsifiable hypothesis, not a supplement recommendation.
Friction Theory (FT) — the substrate-independent version
Generalises BFT to apply to any system with competing options under bounded time and resources — biological, artificial, possibly physical. Tested empirically on 15 language models with seven cross-architecture signatures.
Capacity Scaling — how language models "learn" from pure presentation
Two task types on the same knowledge: cloze (recover a fact) versus application (chain facts into a new result). Cloze saturates early; application scales monotonically across three orders of magnitude. The bottleneck migrates with capacity.
The practical paper. Strategy pipeline alone +7.7 to +20.8 pp; combined with calibrated abstention reaches +12 to +21 pp on the four cells where both were measured. On SimpleQA, the combined pipeline lifts Qwen3-235B past GPT-4o and GPT-4.1. Calibration costs about $1.50 per setup.
Race architecture as a shared vocabulary — the physics-scope paper
The most speculative paper. Proposes the race-architecture vocabulary as a unifying lens that organises existing bounded-commit-dynamics work from quantum measurement to chemical kinetics to human cognition: these may share the same race-structure, differing in substrate not in shape. Falsification criterion specified. "Not new physics — a new lens."
A substrate-grounded taxonomy of emotions. Integrates basic-emotions (Ekman, Plutchik) and constructed-emotion (Barrett) traditions via Friction Theory. Six moving parts generate ~45 distinct feeling-labels. Emotions = substrate signals; feelings = interpretive integrations. Three falsification criteria.
Operational Friction Theory — the operational mechanism
Specifies how friction is mechanistically resolved in any substrate satisfying the race-axioms. Four components: race-opening (the threshold for initiating a race), recursive resolution (multi-scale simultaneous resolution), manifested behaviour (the winning route becomes observable action), and thermodynamic termination. Behaviour is reframed as a manifested resolution-route — with implications for compulsive behaviour, OCD, tics, stress-habits, and burnout as one mechanism.
ICL as working memory, FT as long-term memory — why fine-tuned models hallucinate confidently
Why do fine-tuned LLMs hallucinate more confidently than ICL-equipped counterparts on the same knowledge? Each backward pass amplifies the winning route and presses alternatives below the noise floor; FT compresses the calibrated distribution as a structural consequence of cumulative gradient pressure. ICL preserves it. The distinction maps onto working-memory / long-term-memory. Empirical anchor (Zorbetik, Qwen2.5-3B/7B): cloze gap 16–28 pp, log(CR_pos0) collapse 5.46→21.12, entropy→0. Generalises Paper 1's RLHF-paradox to all weight-update training.
Logic as reactance — why truth-value judgment may be probabilistic all the way down (even in humans)
Truth-value judgment in any race-architecture substrate may be the substrate's reactance signature — what the substrate does when input fails to fit what it has learned. Empirical anchor: a discontinuous cliff-event at the first content-token-position, observed on two LLM architectures (Qwen and Mistral, p < 10−17), eleven encoding-depth checkpoints (monotonic rising), eight floating-point substrates (calculator-overflow scaling law), and a preference-vs-truth cross-domain test. The N400 brainwave is reinterpreted as the biological-substrate readout of the same signature; the human N400 experiment is specified as the direct cross-substrate test. Cognitive dissonance, indoctrination, and expertise reversal as special cases at high encoding-depth.
Substrates encode experience, not information — you learn the friction, not the facts
An encoding-through-loading framework. Substrates code the friction of processing, not the information they were given. Eight experiments on six language models (Qwen2.5 1.5B/7B/32B, Llama-3.3-70B, Qwen3-235B, DeepSeek-V3) on a chemistry composition task recover the classical expertise-reversal U-curve (70B-class: 75→52→61% across 0/1/3-shot — the same shape educational psychology finds in human experts), per-token friction signal peaks at 1-shot, elaborated demonstrations reduce friction by closing the strategy-race, and format-mismatch produces a 22-pp accuracy collapse that the model sustains throughout the response.
Nine learning curves, one shape — a programmatic proposal
A programmatic proposal, not a finished theory. Nine famous curves from learning psychology (Yerkes–Dodson arousal, Vygotsky's zone of proximal development, Kalyuga's expertise-reversal, Bjork's desirable difficulties and spacing effect, Bengio's curriculum learning, the testing effect, Shannon–Berger rate-distortion, and Brehm-Festinger reactance) all show the same inverted-U shape. The paper suggests they might be describing the same underlying mechanism in different vocabularies, and lays out the measurements that would let researchers check.
Same content, wider track — an empirical pilot battery on LLM substrate (Paper 4)
Pilot-scale empirical-calibration companion to Paper 6. Eight friction-intensity axes tested via LoRA fine-tuning on fictive "Zorbetik" facts. Headline finding: same 25 facts trained with 1 paraphrase template → 38% paraphrase-robust recall; trained with 4 paraphrase templates → 94% (+56 percentage points under matched substrate, optimizer, content, and training budget). Four of five intra-session axes produce inverted-U parabels (task-friction, chunking density, learning rate, sampling temperature); the fifth produces a framework-narrowing null that forces a four-way consolidation taxonomy. The HRP-3M cross-substrate direction (deep > passive ≈ surface) replicates on 5 of 6 substrate×paradigm pairings. Pilot-scale (per-condition n=4–30); planned v2 will scale up.
Why we value things we worked for, and when we commit to an answer — Paper 6BC
A programmatic proposal for two ways to read a substrate's race-architecture from its outputs alone. Readout 1: friction invested during encoding leaves a hysteresis trace that biases later comparative judgments. Six classical effort-value biases (IKEA effect, endowment effect, sunk-cost fallacy, generation effect, effort justification, effort heuristic) are argued to share this race-mechanic as one component of an effort-essential subset, not as a single-mechanism reduction of all six. Readout 2: where a language model commits to an answer in its response trajectory. Base models show 3.4× wider spread in when they settle on an answer than instruction-tuned counterparts, drift away from the secretary-problem 1/e optimum as task-interpretation deepens, and show a coupling between recognition and when they settle (r = 0.528) that the instruct model lacks (r = 0.104). The 1/e numerical match is recorded as a coincidence to be replicated, not a finding.
Why we have a self, model other minds, and feel free — Paper 7
One mechanism explains three things at once. Cognitive science treats self-modelling, theory of mind, and free will as three separate territories with three separate literatures. This paper argues they are three faces of one underlying machinery: a substrate that runs hypothetical futures in its head and uses those futures to weight what it does next. Includes a dissolutionist response to the libertarian-vs-deterministic free-will debate — the construction the debate defends (free-of-friction substrate) was never going to be instantiable in race-architecture in the first place.
For teachers, instructional designers, communicators, and anyone who has tried to explain something complex to someone else. Learning science has three big traditions (cognitive load theory, desirable difficulties, psychological safety) that almost never talk to each other. This paper argues they are all describing the same underlying physics from three different angles. From four substrate-level concepts (race-architecture, friction, hysteresis, Net Friction Rule) five classical learning findings fall out as derived consequences. Three classroom failure modes (dump, dilute, ambiguity-without-commit) get diagnostic recipes; four falsification conditions specified.
Compliance is behaviour, not information — Paper 20
We try to produce behaviour by giving people more information: one more policy, one more course, one more warning. But compliance is something you do, not something you know. More information rarely changes what people actually do. The paper explains why, through friction, and what it takes instead.