Nine learning curves, one shape
Paper 6 (core) · Pødenphant Lund (2026i) · Read on Zenodo
I study language models to understand people.The Yerkes-Dodson curve was first shown in mice in 1908. Vygotsky's zone of proximal development was written down in the 1930s. Bjork's desirable difficulties, the spacing effect, expertise reversal, the testing effect: nine famous curves from nine separate research traditions, each with its own specialists and textbooks. And they all come out in exactly the same inverted U-shape: performance rises with engagement, peaks somewhere in the middle, falls again. This is a proposal, not a finished theory. It suggests the nine curves may describe the same underlying mechanism in nine vocabularies, and it spells out which measurements would have to be made to check whether that holds.
The puzzle
Walk into a psychology department and ask about the Yerkes–Dodson curve. It is the inverted U between arousal and performance: too little arousal and you fall asleep; too much and you panic; somewhere in the middle you do your best work. The curve was first shown in mice in 1908 and has been documented across hundreds of species and tasks since.
Walk into another psychology department and ask about Vygotsky's zone of proximal development. It is the inverted U between task difficulty and how much a child learns: too easy and it slides off; too hard and they give up; somewhere in the middle they learn. Vygotsky wrote in the 1930s.
Now ask about Kalyuga's expertise-reversal effect. It is the inverted U between instructional support and learning: the instructional support that helps novices often harms experts. The same support lands at different places on the curve for different learners.
Ask about Bjork's desirable difficulties (effortful retrieval helps; over-effortful retrieval breaks down). Ask about the spacing effect (too frequent re-exposure does not help; too spread out and the trace is lost). Ask about curriculum learning (easy-then-hard works; jumbled does not). Ask about the testing effect (retrieval beats re-reading, up to a point). Ask in an engineering department about Shannon–Berger rate-distortion (transmit more bits up to a limit; past it, fidelity collapses). Ask in social psychology about reactance (a little pushback deepens a conviction; too much breaks it; the relationship between depth-of-conviction and resistance-to-change is also inverted-U-shaped).
Nine different traditions, each with its own specialists, its own textbooks, its own paradigms. None is currently classified as a variant of any other. The surface phenomena look distinct: arousal is not task difficulty, scaffolding has nothing to do with spacing, and bit-rate is about something altogether different from retrieval practice.
And yet the curves come out in the same shape.
The proposal
The paper suggests the nine traditions may all describe the same underlying mechanism in nine vocabularies. The mechanism would be substrate-computational rather than psychological: any limited learning system that has to balance reception against learning against flow will produce an inverted U over engagement.
- Reception is what happens when the receiver is matched to the signal. Maximum information is carried across the channel. But a perfect match leaves no residue, no change of state, no learning. Perfect reception is the wrong regime for encoding.
- Learning requires the input to push past a threshold (the substrate's coercivity, in physics terms). Below the threshold the substrate returns to its earlier state once the input stops. Above the threshold the state is displaced, and the displacement lasts (this is hysteresis).
- Flow is what holds when the system is engaged but not overloaded. Push the mismatch too far past the threshold and the system can no longer resolve it: gradients become noise, parallel routes cannot settle, no coherent encoding happens.
The three principles compose: too matched and you receive perfectly but learn nothing; too mismatched and you learn nothing because the system overloads; the middle band is where learning happens.
The schema's formal expression involves three substrate parameters: c (coercivity, the threshold for a change of state), r (remanence, how much of each crossing lasts), and a coherence function between data and model. The optimum lies inside the transitional band, and three substrate parameters decide where the band sits and how steeply it falls off at the edges.
An honest account of where this stands
Tomas is unusually direct in the paper about what has and has not been done:
- The coherence function is not yet fully specified independently of outcome. The schema is therefore a vocabulary for substrate measurements, not a fully specified predictive equation.
- The three-parameter orthogonality (the claim that coercivity, remanence, and coherence can be extracted independently) has not yet been demonstrated on any single substrate. This is the central live falsifier the paper commits to.
- The empirical work to date is "preliminary computational probing" on a couple of language-model families, not empirical validation.
- Five of the nine traditions are counted as re-expressions: the schema describes them parsimoniously, but generates no new empirical content.
- Two traditions get candidate derivations with new quantitative predictions (Bjork's spacing effect falling out of the recovery-time function is the strongest example).
- Two traditions are speculative extensions: rate-distortion (the c→0 limit is structurally consistent with, but not a term-for-term recovery of, Shannon–Berger) and reactance (the biological inverted U over encoding-depth is a framework prediction rather than a result).
The paper closes by saying so explicitly: what this is is a programmatic statement of a candidate schema, a stratified account of empirical readiness, and an explicit list of the falsifiers it commits to. What this is not is a confirmed unifying theory, a validated reduction, or a replacement for existing native treatments of any of the nine traditions.
The strongest concrete claim
The paper's strongest grade-(b) claim is about Bjork's spacing effect. Combining the within-event learning cliff kinetics with the recovery-time function τ(I) gives a candidate derivation: spacing is not an independent learning law, but a derived temporal shadow of the same capacity limit that produces the cliff. Easy material recovers in minutes; deep material takes overnight; overloaded material takes days. One capacity threshold; three behavioural consequences that classical learning theory has treated as separate laws.
If this candidate derivation holds empirically, it would be a non-trivial structural reduction: a 70-year-old learning law would fall out of a more general physical principle, with quantitative predictions about how long spacing intervals would have to be for material of different cognitive depths.
Why it matters
For learning research. If the proposal is right, the nine traditions talk past each other not because they study different phenomena but because they use different vocabularies. A measurement vocabulary that lets researchers extract the three substrate parameters (coercivity, remanence, coherence) on any substrate would let new-versus-replication research on the inverted U be compared on equal footing across traditions.
For instructional design. The schema suggests that finding the optimum for a specific learner is a measurement problem: what is this substrate's coercivity? What is its remanence per event? What does the coherence function look like at this level of engagement? Answers would generate specific predictions about which kinds of support work and which do not for a given learner.
For artificial substrates. The paper offers concrete operationalisations of the three substrate parameters for silicon (language models): coercivity as learning-rate knee-point sharpness; remanence as post-event logprob margin; coherence as structured-versus-random difficulty contrast. These let the schema be tested on LLM substrates and provide a methodology for parallel work on biological substrates.
For Friction Theory as a whole. Paper 6 supplies the substrate-parameter vocabulary that Papers 1, 2B, 4B and companion papers 6B/6C/6D draw on. It is the "middle layer" between the substrate-universal theory (Paper 1) and the per-phenomenon empirical work (Paper 2B, 4B).
Citation
The full article is open-access on Zenodo. Concept DOI:
Read on Zenodo → · Technical version · Dansk version
Related on this site:
- Paper 1 (Friction Theory) — the substrate-universal theory whose vocabulary Paper 6 develops further.
- Paper 4B (Substrates Encode Experience) — the empirical companion that tests the within-event regime on language models.
- Paper 10 (Race architecture) — derives the inverted-U topology from the race axioms; layered with Paper 6, not redundant.
- The Learning page — framework-level treatment of encoding-through-loading.
- The Memory page — why dumping information does not teach.