Learning — what the framework predicts and finds

From hysteresis as precondition to signal-budget redistribution

Novel scope-condition: the classical expertise-reversal effect (instructional supports help novices but hurt experts; Kalyuga) appears in LLMs only above a model-capacity threshold. Llama-3.3-70B shows the U-curve cleanly; smaller models cannot, because they lack the capacity to be in the "expert" regime where additional examples become interference. This is the first substrate-graded statement of expertise-reversal.

Learning is one of the central themes across the paper series. The framework's claim: learning is a direct consequence of competition under load, not a separate cognitive module. Where you have race architecture (parallel evaluation, bounded resources, irreversible commit), you get learning when commitment leaves a path-dependent trace. Where you do not, you do not.

On this page

1. Hysteresis as the precondition for learning

Hysteresis, path-dependent state retention, has long been treated as an error or side-effect to be minimised. The framework reframes it: across the bounded probabilistic substrates examined in this framework, hysteresis appears to be the structural precondition for learning. In a substrate that bears no trace of its own history, learning does not occur in any of the cases tested so far. Path-dependent state is what makes learning structurally possible.

This applies across the bounded probabilistic substrates the framework examines:

Two competing routes under load — the winner's path leaves a hysteresis trace Before two routes, equal weight input route A route B After resolution under load route A wins, leaves trace input route A (deepened) route B (faded) Next time the same input arrives, A is now more probable than B. That asymmetry — the trace — is what we call learning.
Hysteresis as the precondition for learning. The path-dependent asymmetry between A and B after resolution is the trace; it is also what makes the next resolution easier to bias toward A. Substrates that cannot retain this asymmetry cannot learn.

Hysteresis is empirically replicated cross-architecture in transformers (Cogito-671B, DeepSeek-V3, Qwen3-235B, Llama-3.3-70B) and in a State Space Model (LiquidAI LFM2). The cross-architecture replication is "the strongest evidence in our data that friction mechanics are a property of the race architecture, not of any specific computational implementation." The same conclusion follows for biological substrates.

Source: Paper 1 §2.3, §5.8.4

2. Encoding-through-loading

The standard cognitive-science view treats encoding as a separate process from retrieval and decision. The framework collapses this: what gets encoded is what wins route-competition under load. There is no separate encoding module: the same race-resolution machinery that produces decisions also leaves the trace that constitutes learning.

This connects to two classical findings:

Source: Paper 1 §6.4 · Paper 2

3. Capacity scaling

Two task types on the same knowledge base differentiate by capacity:

The bottleneck migrates with capacity: at 0.5B retrieval fails. At 14B retrieval is saturated and 36% of failures show the "retrieval succeeds, derivation fails" pattern, the friction-ceiling signature at the encoding level.

Implication for educational science: the same knowledge encoded at different capacity levels supports different task types. A learner who can do cloze cannot necessarily do application; the gap is not motivation, it is composition-bounded computation.

Source: Paper 2

4. "Catastrophic" forgetting is signal-budget redistribution

Catastrophic forgetting in fine-tuned LLMs has been interpreted as substrate damage: the claim that the base model "loses" knowledge during adaptation. This interpretation is empirically falsified.

The reverse-test (v13c, Paper 6 forthcoming): remove the LoRA adapter, and the base substrate recovers 179.5% of baseline performance. The base substrate is 100% intact; the adapter rebalances which routes win competition, but does not damage the underlying weights.

The mechanism is signal-budget redistribution: under fine-tuning, route-competition shifts toward the new task, away from the original. The original capability is preserved. It is just outranked. Removing the adapter restores the original ranking.

This subsumes six previously-distinct phenomena under one mechanism:

Tomas's framing for the design rule: "want less — dilute; want more — protect."

Source: Paper 1 §5.8.4 (companion mechanism developed in Paper 6, forthcoming)

5. Calibrated retrieval-practice and Bjork's desirable difficulties

Bjork (1994) argued that desirable difficulties (effortful retrieval, spacing, interleaving) produce better long-term retention than easy practice. The framework provides a mechanism: difficulty raises route-competition, which deepens the hysteresis trace, which is what gets retained.

The prediction is testable in artificial substrates: calibrated retrieval-practice should preserve the recognition-to-commit slope, while calibration-naive training (RLHF-style suppression of friction) should flatten it.

Operational definition: the recognition-to-commit slope is the regression coefficient β in pcommit(answer) = α + β · precognition(answer) + ε, where precognition is the model's logprob on the correct answer under a low-stakes recognition prompt ("which of these is correct?") and pcommit is the model's logprob on the correct answer under a high-stakes commit prompt ("answer in one word"). A well-calibrated model has β ≈ 1: knowing-it predicts committing-to-it. RLHF-flattened models show β < 0.5: recognition decouples from commitment, which is the substrate-level signature of the friction-ceiling pattern.

Paper 4 v10d implements this as a 4-arm design with pre-registered protocol:

Outcome measures: ECE (calibration), slope (recognition-commit relation), and OOD defer-rate (whether the model knows when not to commit).

Source: Paper 1 §5.8.7 · Paper 4 (forthcoming) · Paper 6 (forthcoming)

6. Expertise reversal effect

Kalyuga, Ayres, Chandler & Sweller (2003) found that instructional supports that help novices hurt experts. Worked examples accelerate beginner learning but slow expert performance, because experts have already encoded the pattern and the support now competes with their internal model.

The framework prediction: this should generalise to artificial substrates as a substrate-graded U-curve. Tested in Paper 4b Exp 1 across three model sizes:

The expert tier (70B) shows the same expertise reversal pattern Kalyuga reported for human experts. The substrate-graded scope-condition is novel: the U-curve appears only above a capacity threshold; below that, the substrate cannot represent enough alternatives for the conflict to manifest.

Source: Paper 1 §5.8.7 · Paper 3 §5.4 · Paper 4b (forthcoming)

7. What language models cannot test

Several classical learning phenomena are structurally untestable on inference-time LLMs because the substrate lacks features the human version requires:

Spaced repetition (Cepeda et al. 2006)
The mechanism — trace reactivation across sessions — requires between-session memory persistence. Inference-time LLMs cannot test spacing because they have no inter-session retrieval. Testable only via fine-tuning weight drift over training cycles.
Ebbinghaus forgetting curve (Ebbinghaus 1885)
Same constraint: requires retention measurement across time-separated sessions. The framework predicts items learned under high friction (deep hysteresis trace) should be forgotten more slowly — testable via LLM weight drift during fine-tuning, not via inference-time probing.
Cross-session interference
When learning new material interferes with previously learned material across sessions. Requires session-to-session memory; inference-time LLMs reset. Studied via continual learning paradigms with fine-tuning.

The pattern: between-session memory phenomena require fine-tuning experiments, not inference-time probing. This is a methodological constraint following directly from substrate features.

Source: Paper 1 §9.4 (future work)

8. Implications

For educational science: Bjork's desirable difficulties get a mechanistic foundation. Difficulty is not arbitrary: it is whatever raises route-competition enough to leave a deep hysteresis trace. This predicts which interventions transfer (those that raise route-competition specifically) and which do not (those that just add cognitive load without competition).

For AI training: friction profile during training should predict retention. Calibrated retrieval-practice should preserve recognition-commit slope; RLHF-style friction suppression should flatten it. Paper 4 v10d tests this directly.

For cognitive science: a bridge between human and artificial learning. Phenomena previously studied in humans (Bjork, Von Restorff, Craik & Tulving, expertise reversal) become measurable in substrates where the friction signal is observable. Cross-substrate validation becomes possible.

For clinical translation: signal-budget redistribution as a mechanism for retrieval failure (Paper 8c forthcoming). Dementia presents as failure to commit despite preserved knowledge: the same friction-ceiling pattern observed in LLMs. Diagnostic implication: sub-threshold cuing tests should distinguish encoded-but-unreachable from unencoded.

Related pages