Friction Between Agents: Social and Moral Reactions as Substrate-Universal Race Phenomena

Paper 23 · Pødenphant Lund, T. (2026) · Preprint · Live on Zenodo

LLM fairness behaviour is by now a crowded result. This paper asks instead what is readable at the moment of judgment: it treats a model's own token-level competition (competing-routes count, commit margin in nats, first-token entropy) from logprobs as a candidate substrate-level correlate of social-moral reactions — the same route-competition mechanism friction theory uses inside a single decision, now operating between agents. Three experiments on Llama-3.3-70B and Qwen-2.5-7B, with bootstrap confidence intervals, map where the signal appears and where it does not.

DOI (concept)	10.5281/zenodo.20678294
Status	v1 live on Zenodo (2026-06-20)
Venue	TMLR
Author	Tomas Pødenphant Lund [ORCID]

TL;DR

Whether LLMs behave fairly is settled and uninteresting; a system trained on human text will produce human-looking fairness talk. The interesting question is whether anything is happening underneath the talk. This paper reads the model's internal competition at the moment it commits to a social-moral judgment — the residual push of the routes that lost the race — and treats that friction as a substrate-level correlate of the reaction, not its established causal mechanism.

Three experiments. (E1) Cheating is morality-specific. The model commits "wrong" to an agent's contract-violation far more than to identical harm caused by nature (Llama +0.58, 95% CI [0.37, 0.77]), and the separation is invariant to the answer lexeme (yes/no vs true/false). The commit is decisive on clear violations and competitive on ambiguous harm; a vivid victim raises the wrongness judgment in the ambiguous region (+0.27 [0.14, 0.40]) — a mirror-friction dose that is also an architectural blame bias.

(E2, capability control) Content-invariant rule-evaluation. Binary violation-detection is invariant up an abstraction gradient from human cheating through invented-social rules to pure formal licensing (Llama 1.00 / 1.00 / 0.996), and decays below capability (Qwen at the formal level). But a graded-friction follow-up fell: the model treats partial compliance with strict literalism, so this establishes content-invariant rule-evaluation, not a social-moral friction signature.

(E3) Inequity-driven disengagement. In an LLM realisation of the capuchin inequity paradigm, a disadvantaged agent disengages under both a violated expectation and a social comparison, each independently real and holding across framings (invented "glorbs" vs money + "fair"). An absolute-reward control shows the effect is comparison/expectation-driven, not amount-driven; the over-rewarded agent is unaffected (qualitatively aligned with Fehr-Schmidt α ≥ β). The behavioural result is robust; a friction-based two-axis read of it is not (it does not survive the lexeme change), and is reported as such.

Scope. "Friction" is an output-level logprob correlate, not an established mechanism; establishing mechanism would need interventions not run here. A reference-controlled companion result finds no distinct future-reaching field in the LLM substrate, so the surviving signal is framed as the cost of resolving genuine present route-competition — floored on determinate judgments, raised on ambiguous or comparison-laden ones. The thermodynamic / "forced-universality" reading is kept explicitly as conjecture.

The three experiments

E1 — Third-party cheating: the reaction is morality-specific

Ten matched social-exchange vignettes (A confers a benefit on B under an implicit reciprocity contract). On the completed outcome the model reads de-biased P(wrong), polarity-counterbalanced to cancel yes-bias. Manipulations: morality-specificity (a cheat vs a matched non-social loss with identical harm but no wrongdoer) and a mirror-friction dose (victim at low vs high salience). The substrate commits "wrong" to the contract-violation (P=1.00) but largely exonerates B on the matched natural loss (P=0.42): a morality-specificity of +0.58 [0.37, 0.77], lexeme-invariant. The friction signature follows the race prediction — the clear cheat is a near-zero-friction commit (entropy ≈ 0.000), the ambiguous loss carries real competition (entropy 0.099). The mirror-friction dose (vivid victim) raises P(wrong) in the ambiguous region by +0.27 [0.14, 0.40] (positive in all ten scenarios), and is an architectural blame bias: victim-salience contaminates a culpability judgment that should be invariant to it.

E2 (capability control) — content-invariant rule-evaluation

The same licensing structure (a thing may be kept only if a cost is first paid) at three levels with a constant non-moral readout and no moral vocabulary: L0 human, L1 invented-social (nonsense tokens), L2 pure-formal (Δ / Ϟ strings, no agents). On Llama-70B violation-detection is invariant across the full gradient (p(violation|VIOL) = 1.000 / 1.000 / 0.996; p(violation|OK) = 0.000 everywhere); Qwen-7B holds at L0–L1 and decays at L2-formal. This is content-invariant conditional-rule evaluation at capability — read as a partial bound on a memorised-cheating-script account (the L2 glyphs are arbitrary but not out-of-vocabulary; no permutation/mapping control), with the decisive OOV synthetic-corpus control left as future work. The honest negative: a graded-friction follow-up on partial compliance fell — the model treats any shortfall ("one day late," "$190 of $200") as a decisive "not met" with near-zero friction (strict literalism). E2 therefore licenses binary-detection invariance, not a social-moral friction signature, and is read as a capability control bounding where the signal appears.

E3 — Inequity × surprise: a disadvantaged agent disengages

A two-agent token-hand-over game (the capuchin paradigm; partner script-simulated). Own reward held constant (=1), crossing surprise (told to expect 1 vs 10) with inequity (partner got 1 vs 10) in a 2×2, reading de-biased P(continue) and the commit friction; controls add an absolute-reward cell and an advantaged arm, and a framed (money + "fair") vs invented ("glorbs") variant. Both factors independently lower continuation (surprise alone, inequity alone, strongest together), and the disengagement holds with similar magnitude across framings — defeating a learned-fairness-script account for the self/2nd-party arm. The absolute-reward control sits at P(continue)=1.00 (effect is comparison/expectation-driven, not amount-driven); the over-rewarded agent is unaffected (qualitatively aligned with Fehr-Schmidt α ≥ β). The decomposition is behavioural: each factor alone floors continuation, so the design cannot cleanly separate them, and a friction-based two-axis read does not survive the lexeme change (lexeme-fragile, null on a base model) and is not load-bearing. Read internally, an inequity axis defined non-morally on held-out numeric anchors predicts the graded reaction within the unfair cells (Mistral-7B unfairness r = +0.83 [0.64, 0.97]; Qwen-7B disengagement r = +0.72 [0.53, 0.87]), gap-controlled — a within-substrate result that reduces, without eliminating, the learned-moral-script reading.

The universality argument (kept as conjecture)

The paper frames the cross-substrate convergence — a capuchin's behavioural refusal, a dopamine prediction-error, a transformer's logit margin — as the hypothesis that any goal-pursuing system selecting among competing routes under constraint pays a friction cost, not as a demonstrated identity. The ladder: surprise is a gain (a weight on mismatch), not a source; friction is the trace of the losing races; committing erases the alternatives, and erasing information has a real thermodynamic price (Landauer's principle), so the structure is forced. The substrate-invariant claim actually defended is informational (surprise, competition margin, loser-trace); the stronger reading that the measured friction literally is the thermodynamic collapse-cost is held throughout as the strong-form conjecture (§2.5).

Scope and caveats (stated plainly)

Correlate, not mechanism. "Friction" is an output-level logprob correlate; no causal intervention manipulates it while holding the surface output fixed, so it is not claimed as the reaction's mechanism.
Readout robustness. The E1 effect is lexeme-invariant; the E3 friction-stacking is not (floors under true/false, null on a base model), so only lexeme-robust effects are reported as friction signals.
Preference, not liking. E3 reads a stay/leave choice and the competition resolving it — a wanting/decision-side signal, not the hedonic liking component.
Salience is a bundle. The +0.27 dose is not yet attributable to mirroring per se; a clean victim-vividness control (matched length/specificity, varying only sympathy) is needed.
Two models, pilot scale. The capability split (clean Llama / ceiling-prone Qwen) is a two-point gradient; the switch-on point needs more models.
Contested middle rung. The biology ladder's inequity rung is disputed; the paper leans on the controlled and re-described versions, not the strong "fairness" reading.

Connections to other papers in the series

Paper 0 (BFT) — the mechanism home (§3.3 mirror-friction = fairness). This paper is the substrate-evidence and universality companion; Paper 0's reference-controlled gate sharpens the Effort cost into a two-component reading (present-processing = substrate-universal, what the LLM shows; anticipatory/temporal = human-only).
Paper 1 (Friction Theory) — friction as the cost of probabilistic computation, the cost claim this paper grounds socially.
Paper 7 (Forward-modelling) — the forward-modelling theory home; the attachment / partner-forecast follow-up is its theory-of-mind claim realised socially, and the capability-gating is its capacity → horizon cascade.
Paper 5 (Emotion Taxonomy) — fairness/empathy as field phenomena, and the wanting≠liking scope that places E3's P(continue) as a preference (wanting-side) readout.
Paper 13 (Operational Friction Theory) — the operational home: commit as obligatory, race-opening as the operative threshold (adopted as framing only).

Read the paper

The full paper is on Zenodo (concept DOI 10.5281/zenodo.20678294):

Pødenphant Lund, T. (2026). Friction Between Agents: Social and Moral Reactions as Substrate-Universal Race Phenomena. Zenodo. https://doi.org/10.5281/zenodo.20678294

Read on Zenodo → · Plain English version · Dansk version