Same Content, Wider Track: Empirical Calibration of Friction Theory on LLM Substrate

Paper 4 · Pødenphant Lund, T. (2026g) · Preprint · Live on Zenodo

Pilot-scale empirical-calibration companion to Paper 6 (matched-friction-under-hysteresis), tested on LLM substrate via LoRA fine-tuning on fictive "Zorbetik" facts designed to eliminate pretraining priors. Four of five intra-session friction-intensity axes produce inverted-U parabels; the fifth produces a framework-narrowing null that forces refinement of the consolidation taxonomy. The v12b paraphrase-augmentation finding (same 25 facts, 1 template → 38% recall, 4 templates → 94% recall under matched substrate) is the empirical anchor for "varied friction = wider track."

DOI (concept)10.5281/zenodo.20059859
StatusPreprint live as v1, 2026-05-27 (pilot-scale)
Cite-letter2026g
Per-condition n4–30 (mixed-model, mixed-paradigm portfolio)
AuthorTomas Pødenphant Lund [ORCID]

TL;DR

Friction Theory (FT, Paper 1) is the substrate-universal framework for bounded probabilistic computation systems satisfying the race axioms. The matched-friction-under-hysteresis formulation (Paper 6) predicts an inverted-U between substrate-level friction intensity and encoding outcome on any axis where friction scales monotonically. Behavioural Friction Theory (BFT, Paper 0) is the biological specialisation. Paper 4 is the empirical-calibration companion on LLM substrate.

Methods. Eight friction-intensity axes tested via fine-tuning Qwen2.5-7B-Instruct (4-bit NF4 LoRA, r=16, α=32) on fictive "Zorbetik" facts. Cross-substrate-paradigm tests on Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Instruct, Llama-3.1-8B-base, and Qwen2.5-3B-Instruct via FT trajectory and ICL-inference paradigms.

Headline results. Four of five intra-session axes produce genuine inverted-U parabels:

The fifth axis (violation magnitude, v11c) produces a framework-narrowing null on behavioural accuracy. The v12 consolidation differential preservation under orchestrated replay confirms Level-0 encoding-differentiation via distinctiveness-confidence rather than surprise-magnitude-proportionality. The Bjork-spacing bonusparabel (v4c/v4d) produces a formally-undecided null traced to architectural absence of internal between-session replay.

Distribution-shape extension (§4.7, NEW). A seventh and eighth axis sweep training-distribution shape rather than friction-intensity. The v12b paraphrase-augmentation experiment provides the cleanest within-content demonstration of distribution-shape as substrate-friction-parameter: same 25 facts trained with 1 paraphrase-template yield 38% paraphrase-robust memorisation; same facts trained with 4 paraphrase-templates yield 94% (lift +56 pp under matched substrate, optimizer, total facts, total epochs). This is the empirical anchor for "varied friction = wider track" inherited from the human desirable-difficulties literature (Schmidt & Bjork 1992; Rohrer & Pashler 2007).

Three Paper-6 refinements forced by Paper 4 data:

Distribution-shape cross-substrate matrix. HRP-3M direction (deep > passive ≈ surface in correctness or first-token CR distribution) is confirmed on 5 of 6 substrate × paradigm pairings spanning two model families (Qwen, Llama), three sizes (3B/7B/8B), and two paradigms (FT trajectory + ICL inference). The 6th pairing is ceiling-masked under the pre-flight test-informativeness criterion, consistent with that criterion.

Two substrate-distinguishing friction-type axes (§11.6). Race-level friction including reactance is substrate-universal (Paper 14, Paper 2B, Paper 5 empirical anchors). Two axes distinguish LLM substrate from biological substrate: self-continuity (confirmed substrate-absent via §4.8 v10g), and surprise (biology-specific via §4.3 v11c null + §11.5 "Dunning-Kruger without Mount Stupid" theoretical extension).

Scope and status. Paper 4 is a pilot-scale preprint timestamping preliminary empirical observations on LLM substrate. Per-condition n in the range 4–30; mixed-model, mixed-paradigm portfolio. Individual findings should be read as exploratory pilot results that constrain framework development, not as definitive cross-substrate calibrations. A planned v2 revision will scale per-condition n on the core axes (see §10.1). A pilot scale-up attempt (Together SFT, n=100/condition at both LoRA r=16 and r=64) produced 0% accuracy — substrate-saturation finding documented in §10.3.

Key findings

Methods snapshot

Substrate. Qwen2.5-7B-Instruct, 4-bit NF4 LoRA (r=16, α=32), trained on fictive "Zorbetik" facts (invented domain designed to eliminate pretraining priors).

Cross-substrate pairs. Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Instruct, Llama-3.1-8B-base, Qwen2.5-3B-Instruct — tested via FT trajectory and ICL-inference paradigms.

Friction-intensity axes (1–6): task-friction (HRP-3M passive/surface/deep), chunking density, learning rate, sampling temperature, violation magnitude, Bjork spacing.

Distribution-shape axes (7–8): paraphrase-template count (v12b), [eighth axis specified in §4.7].

Pre-flight test-informativeness criterion. Substrate-task pairings must produce a ceiling-floor accuracy spread of ≥ 30 percentage points for between-condition tests to be informative. The criterion is stated upfront; ceiling-masked pairings (where the substrate saturates the task before the manipulation can show effect) are documented as scope-conditions, not framework-disconfirmations.

The four-way consolidation taxonomy

The v11c null on violation-magnitude + v12 differential preservation under orchestrated replay together force a refinement of Paper 6 §4.10:

What this paper is, and is not

What it is: a pilot-scale empirical-calibration battery on LLM substrate; an empirical anchor for Paper 6's matched-friction-under-hysteresis schema; a forced refinement of Paper 6's consolidation taxonomy and axis-specific kinetics; a stated set of falsification commitments under the pre-flight test-informativeness criterion.

What it is not: a definitive cross-substrate calibration; a confirmed substrate-universal validation of matched-friction-under-hysteresis; a replacement for in-principle replication of any specific finding at production scale. Per-condition n in the range 4–30 means individual point estimates carry wide intervals; the directional findings are what the paper claims, replicated where stated, scope-conditioned where the test was masked.

Connections to other papers in the series

Read the paper

The full paper is on Zenodo (concept DOI 10.5281/zenodo.20059859):

Pødenphant Lund, T. (2026g). Same Content, Wider Track: Empirical Calibration of Friction Theory on LLM Substrate. Zenodo. https://doi.org/10.5281/zenodo.20059859

Read on Zenodo → · Plain English version · Dansk version