Same Content, Wider Track: Empirical Calibration of Friction Theory on LLM Substrate
Paper 4 · Pødenphant Lund, T. (2026g) · Preprint · Live on Zenodo
Pilot-scale empirical-calibration companion to Paper 6 (matched-friction-under-hysteresis), tested on LLM substrate via LoRA fine-tuning on fictive "Zorbetik" facts designed to eliminate pretraining priors. Four of five intra-session friction-intensity axes produce inverted-U parabels; the fifth produces a framework-narrowing null that forces refinement of the consolidation taxonomy. The v12b paraphrase-augmentation finding (same 25 facts, 1 template → 38% recall, 4 templates → 94% recall under matched substrate) is the empirical anchor for "varied friction = wider track."
| DOI (concept) | 10.5281/zenodo.20059859 |
| Status | Preprint live as v1, 2026-05-27 (pilot-scale) |
| Cite-letter | 2026g |
| Per-condition n | 4–30 (mixed-model, mixed-paradigm portfolio) |
| Author | Tomas Pødenphant Lund [ORCID] |
TL;DR
Friction Theory (FT, Paper 1) is the substrate-universal framework for bounded probabilistic computation systems satisfying the race axioms. The matched-friction-under-hysteresis formulation (Paper 6) predicts an inverted-U between substrate-level friction intensity and encoding outcome on any axis where friction scales monotonically. Behavioural Friction Theory (BFT, Paper 0) is the biological specialisation. Paper 4 is the empirical-calibration companion on LLM substrate.
Methods. Eight friction-intensity axes tested via fine-tuning Qwen2.5-7B-Instruct (4-bit NF4 LoRA, r=16, α=32) on fictive "Zorbetik" facts. Cross-substrate-paradigm tests on Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Instruct, Llama-3.1-8B-base, and Qwen2.5-3B-Instruct via FT trajectory and ICL-inference paradigms.
Headline results. Four of five intra-session axes produce genuine inverted-U parabels:
- Task-friction (v10c, R_LLM ratio ordering: deep_reason 0.66 < passive 0.75 < surface_sort 0.95) — cross-family replicated.
- Chunking density — R² = 0.89.
- Learning rate — catastrophic cliff with σ = 0.019 log-units.
- Sampling temperature — R² = 0.985, peak T ≈ 0.4–0.5.
The fifth axis (violation magnitude, v11c) produces a framework-narrowing null on behavioural accuracy. The v12 consolidation differential preservation under orchestrated replay confirms Level-0 encoding-differentiation via distinctiveness-confidence rather than surprise-magnitude-proportionality. The Bjork-spacing bonusparabel (v4c/v4d) produces a formally-undecided null traced to architectural absence of internal between-session replay.
Distribution-shape extension (§4.7, NEW). A seventh and eighth axis sweep training-distribution shape rather than friction-intensity. The v12b paraphrase-augmentation experiment provides the cleanest within-content demonstration of distribution-shape as substrate-friction-parameter: same 25 facts trained with 1 paraphrase-template yield 38% paraphrase-robust memorisation; same facts trained with 4 paraphrase-templates yield 94% (lift +56 pp under matched substrate, optimizer, total facts, total epochs). This is the empirical anchor for "varied friction = wider track" inherited from the human desirable-difficulties literature (Schmidt & Bjork 1992; Rohrer & Pashler 2007).
Three Paper-6 refinements forced by Paper 4 data:
- σ is strongly axis-dependent (~13× span between LR and temperature) → Paper 6 §4.11 axis-specific kinetics.
- The Michaelis-Menten × sigmoid-gate formula requires a baseline-offset extension for axes on already-trained substrate → Paper 6 §4.5.
- The v11c null + v12 differential preservation force a four-way consolidation taxonomy (Paper 6 §4.10): functional consolidation (BFT, biological), orchestrated mechanical analog (FT, LLM + replay), pure mechanical substrate (FT, LLM without replay), tool-augmented mechanical consolidation (FT, LLM + retrieval, untested but framework-predicted).
Distribution-shape cross-substrate matrix. HRP-3M direction (deep > passive ≈ surface in correctness or first-token CR distribution) is confirmed on 5 of 6 substrate × paradigm pairings spanning two model families (Qwen, Llama), three sizes (3B/7B/8B), and two paradigms (FT trajectory + ICL inference). The 6th pairing is ceiling-masked under the pre-flight test-informativeness criterion, consistent with that criterion.
Two substrate-distinguishing friction-type axes (§11.6). Race-level friction including reactance is substrate-universal (Paper 14, Paper 2B, Paper 5 empirical anchors). Two axes distinguish LLM substrate from biological substrate: self-continuity (confirmed substrate-absent via §4.8 v10g), and surprise (biology-specific via §4.3 v11c null + §11.5 "Dunning-Kruger without Mount Stupid" theoretical extension).
Scope and status. Paper 4 is a pilot-scale preprint timestamping preliminary empirical observations on LLM substrate. Per-condition n in the range 4–30; mixed-model, mixed-paradigm portfolio. Individual findings should be read as exploratory pilot results that constrain framework development, not as definitive cross-substrate calibrations. A planned v2 revision will scale per-condition n on the core axes (see §10.1). A pilot scale-up attempt (Together SFT, n=100/condition at both LoRA r=16 and r=64) produced 0% accuracy — substrate-saturation finding documented in §10.3.
Key findings
- Task-friction inverted-U (v10c) — deep_reason 0.66 < passive 0.75 < surface_sort 0.95, cross-family replicated (Qwen + Llama).
- Chunking density inverted-U — R² = 0.89 on the cross-condition parabel.
- Learning rate catastrophic cliff — σ = 0.019 log-units; sharpest cliff of any axis tested.
- Sampling temperature inverted-U — R² = 0.985, peak T ≈ 0.4–0.5.
- Violation magnitude null (v11c) — framework-narrowing; forces consolidation-taxonomy refinement; not interpreted as framework-disconfirmation.
- v12b paraphrase-augmentation lift — same 25 facts, 1 template → 38% paraphrase-robust recall; 4 templates → 94% (+56 pp under matched substrate / optimizer / facts / epochs).
- HRP-3M cross-substrate confirmation — 5 of 6 substrate × paradigm pairings replicate the deep > passive ≈ surface direction; 6th is ceiling-masked per pre-flight test-informativeness criterion.
- Self-continuity substrate-absent — v10g confirms LLM substrate lacks the self-continuity axis present in biological substrates.
- Surprise biology-specific — v11c null + §11.5 theoretical extension narrows surprise-driven retention to biological substrates with consolidation machinery LLMs do not instantiate.
- Bjork-spacing formally-undecided null — v4c/v4d traced to architectural absence of internal between-session replay; not framework-disconfirmation, scope-condition discovery.
Methods snapshot
Substrate. Qwen2.5-7B-Instruct, 4-bit NF4 LoRA (r=16, α=32), trained on fictive "Zorbetik" facts (invented domain designed to eliminate pretraining priors).
Cross-substrate pairs. Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Instruct, Llama-3.1-8B-base, Qwen2.5-3B-Instruct — tested via FT trajectory and ICL-inference paradigms.
Friction-intensity axes (1–6): task-friction (HRP-3M passive/surface/deep), chunking density, learning rate, sampling temperature, violation magnitude, Bjork spacing.
Distribution-shape axes (7–8): paraphrase-template count (v12b), [eighth axis specified in §4.7].
Pre-flight test-informativeness criterion. Substrate-task pairings must produce a ceiling-floor accuracy spread of ≥ 30 percentage points for between-condition tests to be informative. The criterion is stated upfront; ceiling-masked pairings (where the substrate saturates the task before the manipulation can show effect) are documented as scope-conditions, not framework-disconfirmations.
The four-way consolidation taxonomy
The v11c null on violation-magnitude + v12 differential preservation under orchestrated replay together force a refinement of Paper 6 §4.10:
- Functional consolidation (BFT, biological) — surprise-magnitude-proportionate consolidation machinery; native to biological substrate; produces the von Restorff effect and surprise-modulated retention.
- Orchestrated mechanical analog (FT, LLM + replay) — consolidation produced by explicit replay scheduling; LLM substrate gains a functional approximation but loses the surprise-proportionality.
- Pure mechanical substrate (FT, LLM without replay) — no consolidation; encoding is one-shot at the gradient step.
- Tool-augmented mechanical consolidation (FT, LLM + retrieval) — consolidation outsourced to external retrieval; framework-predicted, untested in this paper.
What this paper is, and is not
What it is: a pilot-scale empirical-calibration battery on LLM substrate; an empirical anchor for Paper 6's matched-friction-under-hysteresis schema; a forced refinement of Paper 6's consolidation taxonomy and axis-specific kinetics; a stated set of falsification commitments under the pre-flight test-informativeness criterion.
What it is not: a definitive cross-substrate calibration; a confirmed substrate-universal validation of matched-friction-under-hysteresis; a replacement for in-principle replication of any specific finding at production scale. Per-condition n in the range 4–30 means individual point estimates carry wide intervals; the directional findings are what the paper claims, replicated where stated, scope-conditioned where the test was masked.
Connections to other papers in the series
- Paper 1 (Friction Theory) — the substrate-universal framework whose race-axioms Paper 4's empirical battery targets.
- Paper 6 core (Matched Friction Under Hysteresis) — the schema paper. Paper 4 is the empirical companion; the v11c null + v12 differential preservation force the four-way consolidation taxonomy now in Paper 6 §4.10.
- Paper 4B (Substrates Encode Experience) — the inference-time companion. Paper 4 covers training-time SFT mechanics on weight-state substrate; Paper 4B covers inference-time ICL mechanics on context-state substrate. Cross-citation runs both ways.
- Paper 2 (Capacity scaling) — cloze-vs-application asymmetry across model sizes. Paper 4's substrate-graded axis-specific kinetics extend the capacity-substrate story to fine-tuning dynamics.
- Paper 2B (ICL/FT memory) — the working-memory / long-term-memory distinction. Paper 4 instruments the FT side (long-term) of that distinction with axis-specific kinetics.
- Paper 6BC (Two Candidate Readouts) — the substrate-signatures companion. Paper 4's empirical battery provides the substrate-level grounding for what 6BC reads out.
Read the paper
The full paper is on Zenodo (concept DOI 10.5281/zenodo.20059859):