The Physics of Learning
Paper 16 · Pødenphant Lund (2026) · Read on Zenodo
I study language models to understand people.Sweller, Bjork and Edmondson built three of the most important theories of learning, but the three literatures almost never talk to each other. Each is about its own thing: how much working memory can hold, why effortful retrieval makes things stick, and why you learn less when you do not feel safe. This paper argues that the three describe the same underlying mechanism from three different angles. If you teach, design courses, or have to explain something complicated to another person, it is the same physics you are working with every time.
Three traditions, one mechanism
If you read learning research, you run into three big literatures that have mostly grown up in isolation:
- Cognitive load theory and multimedia learning (Sweller, Mayer). What working memory can and cannot hold; how to design materials that do not overwhelm it.
- Desirable difficulties and retrieval practice (Bjork, Roediger, Karpicke). Why effortful retrieval makes things stick; why being tested teaches better than rereading; why spacing beats cramming.
- Psychological safety and need prepotency (Maslow, Edmondson). Why people learn less when they do not feel safe; why the felt sense of safety competes with the work of learning.
Each of the three is empirically excellent. And each is mechanically thin: it tells you what happens very well, but not why it happens at the level where the learning actually takes place. Cognitive load theory says working memory has limits, but not why those limits exist or why they have exactly the shape they do. Desirable difficulties says effortful retrieval helps, but not what actually happens during retrieval that makes it stick. Psychological safety says safety beats content, but not what mechanism makes safety the thing that comes first.
The paper argues that all three are local consequences of the same constraint. It calls it a limited-capacity race architecture, and that is worth unpacking, because it is the idea the whole of the rest hangs on. Picture the system that does the learning (a brain, a neural network) as always having several possible answers in play at once. They compete, and the system has to pick one. But it can only hold a limited number in play at a time, because it costs resources. That is what is meant by a "race" under limited capacity. The things the three traditions describe fall out as consequences of exactly that constraint.
Four ideas, and what they explain
To get from "there is a constraint" to concrete predictions, the paper uses four ideas:
- Race architecture. The system runs several possible answers in parallel and commits to one. How many it can run at once is set by how much computing power it has available.
- Friction is the price of having several answers in play at the same time. When several candidates are alive at once, it costs something in time, energy and attention to keep them all open. That price is what friction theory calls friction.
- You remember the work, not the material (encoding-through-loading). What sticks is not what you were shown, but what your system itself did while it processed it. The trace is which costs you paid along the way.
- The system is not chasing the right answer, but the lowest total friction (the Net Friction Rule). Over time it pulls toward whatever has cost the least in the long run. What "feels right" is what has lowered total friction the most.
Out of those four ideas, five classic findings from learning research fall out almost on their own:
- Working memory has limits. Not some random fact about your brain, but a direct consequence of a limited-capacity race only being able to hold a few things in play.
- Testing works better than rereading (the testing effect). You encode by retrieving for yourself, not by being passively exposed to the material. Being tested IS the work that lays the trace.
- Spacing and interleaving. Two different kinds of race: one where you wait until the trace is starting to fade, one where you mix in related material so it is easy to confuse. Both work, each for its own reason.
- Experts and novices need different help (expertise reversal). Once you have locked onto a way of doing things, extra help pushes you outside the window where the resistance is just right. Help that benefits the novice harms the expert.
- Safety comes before content. When the system's safety response is active, it takes up the room in the race, no matter what you then try to teach it.
Why language models
Here is why I work with language models. You cannot look inside a brain while it learns, not without anaesthetic, and even then you do not see the single choice being made. A language model you can look straight into. It is a mechanical mirror, where the constraints I have described lie open. You can watch the competition between possible answers word by word. You can see that when you push the capacity hard enough, the system's ability to learn collapses suddenly rather than gradually. You can see that help which benefits an inexperienced model harms a more trained one. And you can see the difference between dense material (many answers in play, high load) and thinned-out material (few answers in play, low load).
None of that is "proof" that the same constraints hold in a biological brain. The paper is careful on that point: the language models are a mirror, not a load-bearing argument. The load is carried by the mechanical argument itself (Paper 1 and 4B). The language models show what the mechanism looks like when you can actually watch it work, and that gives concrete predictions for what you ought to be able to measure in a biological system.
Three ways teaching fails
The same single constraint gives three different ways teaching can go wrong. You will recognise all of them:
- Too much at once (the dump). The amount of material is larger than the system can sort through in time. This is the classic information overload.
- Too thin (the dilution). The right information is present, but so thinned out that the competition between answers never really gets going. When the race does not open, there is no work to encode, and so nothing sticks.
- Never a decision (ambiguity that never settles). Several possible readings are held open all the way through, without anything ever closing them. The system stays stuck in a high-resistance state and never gets the release when something finally falls into place, which is the thing that makes it stick.
Each has its own fix. Too much calls for cutting down. Too thin calls for concentrating the material so the competition gets going. Never-a-decision calls for making some choices for the learner, so the field of possibilities closes.
The principle of matched resistance from Paper 6 shows up here in a variant: do not explain too thoroughly. When you explain everything, you remove the work the learner was meant to do, and it is the work that does the learning.
Practical implications
- Curiosity gaps are matched resistance you can build yourself. A good curiosity gap opens exactly the race the learner needs to run, and then the work sticks as learning.
- Say in advance what is coming. When you tell people what is ahead, the learner gets to say yes to it themselves. That goes around the pushback (reactance) a direct instruction would otherwise trigger, because we do not like being told.
- Most of what fails in organisations is about meaning, not content. Most failed learning efforts get misdiagnosed as a content problem, when the real problem is that people cannot see why the material matters. It is the meaning that is missing, not the information.
- School material you cannot use afterwards is not a bug, but a design feature. Formal education deliberately teaches detached from concrete context. The paper argues that this is exactly why so much school material never finds its way into real use.
What would knock it down
An idea is only worth something if it can be wrong. So the paper says plainly what would knock it down. It is in trouble if:
- you cannot find the same mechanical fingerprint in a biological brain.
- you cannot measure the window where the resistance is just right.
- pushing the capacity does not produce the sudden collapse that is predicted, but only a steady decline.
- the word-by-word competition between answers has no counterpart in a biological system.
That is not all the predictions, but it is the load-bearing ones. If even one of them does not hold, the account has to be corrected on exactly that point.
Why it matters
For education research. If the three big traditions are at bottom describing the same thing, they ought to be unifiable. The paper offers the common language to do that work.
For course designers. The three failure modes (too much, too thin, never a decision) give you a diagnostic language that points straight at what you then have to change.
For you who communicate inside an organisation. Most of what fails in organisational learning is about meaning, not the amount of information. That follows directly from the account here.
For you who build courses. The window where the resistance is just right is the target. Too easy, and the competition between answers never gets going. Too hard, and it breaks down. It is in the middle that the learning sticks.
What I do not know
I want to be honest about where the line runs. What I can show directly is that the mechanism behaves as described in language models. That the same mechanism is at work in a human brain, I have good reasons to believe, but it is a conjecture, not a proof. The crux is whether you can measure the same fingerprints in a biological system, and that measurement still lies ahead of us.
Nor do I know exactly where the window lies where the resistance is just right, because it moves with the material, with the learner, and with how far that person has already come. The paper gives a language for thinking about it, not a formula that tells you where to draw the line in your concrete situation. That is future work, and some of it can only be done together with people who measure in biological systems.
The cite
Read on Zenodo → · Technical version · Dansk version
Related on this site:
- Paper 1 (Friction Theory) — the foundational framework that Paper 16 translates into teaching practice.
- Paper 6 core (Matched Friction Under Hysteresis) — the formal account of "just enough resistance". To explain too thoroughly is to overshoot the upper bound.
- Paper 4 (Same content, wider track) — the empirical experiments Paper 16 builds on.
- Paper 4B (Substrates encode experience) — why you remember the work, not the material, seen from the inference side.
- The Learning page — the broader story that we remember the work we do with the material.
- The Memory page — why piling on information does not teach.