Why a model can react "that's unfair" the way we do
Paper 23 · Pødenphant Lund (2026) · Read on Zenodo
I use friction theory to understand social and moral reactions.Show a language model someone breaking a deal, or a partner getting a bigger reward for the same work, and you can read a reaction in its own internal numbers that looks like the reaction you have: it commits hard to "that was wrong," it hesitates where the case is genuinely murky, and it quietly disengages when it is treated unfairly. The same little machinery that, in friction theory, decides any single question inside one mind seems to run again when there are two parties in the room. I read that machinery at work in social and moral situations, and ask how far the resemblance really goes.
The everyday version
"That's not fair." "He cheated." "Why should I bother if she gets paid double for the same job?" These are some of the most human reactions there are. They feel like the opposite of cold calculation: they feel like things only a social, feeling creature does. So it is genuinely strange to watch a language model produce them, not as words it copied, but with an internal hesitation-or-decisiveness you can actually measure.
The catch, and I am careful about it, is that a model trained on human text can easily say "I protest this unfairness" because that sentence is sitting in its training data. Saying it proves nothing. So I do not look at what the model says. I look at how hard the decision was for the model to reach.
Reading the strain, not the words
Every time a model picks its next word, several candidate words are racing to be chosen. One wins. The losers do not just vanish; they pushed. Friction theory calls that push friction: the leftover pressure from the routes that lost the race. You can read it off the model's own probabilities at the moment it commits to an answer. A clear-cut call has almost no friction (one route dominates). A genuinely hard call carries a lot of it (several routes are still live).
This is useful because friction is not something a model can fake. It can type the words "this is unfair" effortlessly. It cannot fake how close the internal race was. So friction lets us separate a rehearsed performance from a reaction that actually cost the system something to reach.
What the model actually does
It treats cheating differently from bad luck
The first experiment gave the model matched stories. In one, a person breaks a promise and someone gets hurt. In the other, the same person gets hurt by exactly the same amount, but through bad luck (a bank transfer failed, a storm hit) with no one at fault. The model commits "wrong" to the deliberate cheat far more than to the identical harm caused by nature. It is not reacting to the harm; it is reacting to the broken contract. And that split held up when the answer words were swapped (yes/no for true/false), so it is not a quirk of which words were used.
There is a revealing side-effect. When the victim is described vividly and sympathetically, the model rates a non-guilty person as more "wrong" than it does when the victim is faceless. A sympathetic victim contaminates a blame judgment that should not depend on it. That is a bias, and it falls straight out of the same mechanism that produces the fairness reaction in the first place.
It applies a rule even when you strip out all the social content
The second experiment asked: did the model just memorise "cheating is bad," or can it run the underlying logic on content that has nothing social in it? The same rule structure (you may keep the thing only if you paid the cost) was tested at three levels: a human story, an invented-society story with nonsense words, and finally pure abstract symbols with no people at all. The large model detected the violation just as reliably at every level. The smaller model managed the social levels but broke down on the pure-symbol one.
An honest finding sits next to this one: when the case is partly compliant (a book returned one day late, $190 of a $200 debt), the model does not hesitate the way a person might. It treats any shortfall as a flat "not met," with no friction at all. So this experiment shows the model can apply a conditional rule across very different content, but it does not show a graded moral sense.
A disadvantaged agent quietly disengages
The third experiment is a version of a famous animal study, where a capuchin monkey refuses to keep working once it sees a partner getting a better reward for the same task. In the model's version, an agent doing a task is told its partner got far more for the same effort. The agent becomes less willing to continue. Two separate things drive this: a violated expectation (it was promised one thing and got another) and a social comparison (the partner did better). Both matter on their own, and the effect shows up whether the reward is made-up "glorbs" or real money labelled "fair" so it is not just a learned script about the word "fairness." A control confirms it is the comparison, not the low reward by itself, that triggers the disengagement. And the agent who gets more than its partner is not bothered, which is the same lopsidedness humans show: we mind being shortchanged far more than being overpaid.
The honest line I hold
I am careful not to oversell. The friction signal is a correlate of the reaction, not proof of its mechanism. Reading something in the model's numbers that tracks a reaction is not the same as proving that signal causes the reaction, and I do not run the experiments that would settle that. I also do not claim the model feels anything: what I read is a preference (stay or leave) and the strain of deciding it, which is the wanting-side of the brain's reward system, not the felt-pleasure side. The point is narrower and still interesting: a system with no body, no evolution, and no social life shows the same decision-level signature we do, which suggests the signature comes from the abstract shape of the problem rather than from anything uniquely biological.
Why this matters
If "that's unfair" is, at bottom, what it costs any goal-pursuing system to resolve a clash between competing options, then social and moral reactions are not a separate, special faculty bolted onto cognition. They are the same friction we already see inside a single decision, now showing up between agents. That is the thread I spin out of Paper 0's section on social reactions and test on a substrate that shares none of our biology.
The cite
Read on Zenodo → · Technical version · Dansk version
Related on this site:
- Paper 0 (BFT) — the mechanism home; this paper is the social spin-off of Paper 0's mirror-friction account of fairness.
- Paper 1 (Friction Theory) — the substrate-universal framework whose race-axioms this paper grounds in a social setting.
- Paper 5 (Emotion Taxonomy) — the wanting-vs-liking distinction that places the disengagement as a preference, not a felt state.
- What LLMs reveal — why reading a model's internal numbers can teach us about cognition in general.