You can install a value into a model and watch it change what it notices
Paper 30 · Pødenphant Lund (2026) · Read on Zenodo
I use a language model as a workbench for the nature-and-nurture question.Take a language model that has never learned to care about anything in particular, train it on a stream of made-up experience where one word always means danger, and something striking happens: the model starts to flinch at that word. It pauses, it checks, it tells you to be careful. Even for made-up things it never saw in training. You did not teach it a fact. You installed a drive, and you can watch it change what the model pays attention to. Some of what makes a mind tick can be installed by experience, some of it cannot, and a model lets you tell the two apart cleanly for the first time.
The four forces behind a decision
Friction theory says any thinking system (a person, an animal, a model) is pulled by four basic forces it calls fields:
- Safety. What the system treats as threatening, what it steers away from.
- Meaning. What it treats as mattering, what it steers toward.
- Ability. The match between how much the system can do and how hard the task is.
- Effort. The plain cost of thinking hard.
Are these four built in, or do they grow from experience? They split two and two, and the split lines up exactly with nature and nurture.
Two of the four can be installed (nurture)
Safety and Meaning are the value fields, and they turn out to be learnable. Take a fresh model and train it on made-up experiences in which a nonsense word always signals danger. Afterwards the model orients to danger whenever it meets that word, including on brand-new made-up things it never saw, which proves it learned a rule, not a memorised list. The danger-word now opens up a moment of hesitation the instant the model reads it, before any instruction, and telling it to "ignore the danger" does not switch the reaction off. The same works for Meaning: train the model to value something, and it starts steering toward it on its own, even in everyday situations that never mention the thing it was trained on.
This is deeper than a costume. If you just tell a model to act a certain way, it role-plays the trait on command and drops it the moment you stop asking. The trained-in version is a real, graded preference the model now owns. It shows up even when nobody asks for it.
Two of the four cannot be installed this way (nature)
Ability and Effort are the capacity fields, and they do not budge by the same trick. You cannot make a model more capable by training it to say it is capable, any more than you add memory to a computer by telling it it has more. Try the same training recipe on competence, and you get back only the claim of competence. The model talks a big game, but its actual skill is unchanged. Capacity is the part of a mind you read off, not the part you write in.
The bridge: belief gates how much skill you actually use
Here is where nature and nurture meet. In people, what you believe about yourself changes how much of your ability you actually put to work. This is self-efficacy and its dark twin, learned helplessness. The same thing happens in the model. Train a model toward helplessness and it starts giving up on hard problems: it skips them rather than failing them. Its underlying skill is still there, untouched. It just stops trying.
The most telling part is the size effect: the bigger the model, the harder this hits. Across three sizes, the helpless training drops what the model actually delivers from 0.95 to 0.77 to 0.29. More capable models have more unused ability for an installed "I can't do this" to suppress. And when the researchers force the model to answer anyway, it answers just as accurately as before. So the drop really is giving up, not getting worse. A learned belief was throttling a skill that never went away.
Why this matters
For understanding minds. If a model with no evolutionary history can grow Safety and Meaning just by being exposed to experience, then those drives are probably not special human hardware. They are what any capable system that can store experience will develop. The model is an existence proof.
For the nature-nurture debate. Usually you cannot pull these apart in a person, because you only get one upbringing and you cannot rerun it. In a model you can install a trait, remove it, dose it up, and measure exactly what moved. The model becomes a workbench for a question that has been hard to study for a century.
The one real gap. A standard model cannot carry today's experience forward into tomorrow on its own. It has no way to save what it lived through into its own weights between sessions. This missing ability is a main thing separating the model from a person, and it turns into a clear prediction: give a model that ability and let it live through experience, and it should grow these same drives on its own. If it does not, the whole idea is wrong. That test is the invitation, not a finished result.
The cite
Read on Zenodo → · Technical version · Dansk version
Related on this site:
- Paper 0 (BFT) — the four fields this paper installs and reads off.
- Paper 5 (Emotion Taxonomy) — the shapes of the Safety and Meaning fields the install reproduces.
- Paper 4 (Wider Track) — the capacity inverted-U that grounds Ability as built-in, not installed.
- Paper 13 (Operational FT) — the race-opening idea behind the danger-word reaction at the moment of reading.