You are not loading facts into the model

Paper 4C · Pødenphant Lund (2026) · Read on Zenodo

I use friction theory to understand what you can teach a model.When you fine-tune a model, or give it a few examples in the prompt, it feels like you are pouring information in: facts, knowledge, the right answers. You are not. What you are really doing is shaping how the model leans. You are teaching it a tendency, a way of responding, not a body of facts. The two most common ways people try to teach a model something both install a disposition, not data, and pushing harder on "this answer is just more correct" does not load the answer in. It bends the path the model takes to deliver what it already knows.

Two ways to teach a model

If you want a model to do something specific — check whether a question's assumptions are even true, say "I don't know" instead of bluffing, commit only when it is sure — you have two everyday tools:

The folklore says fine-tuning is the strong, permanent tool and prompt examples are the weak, temporary one. Something more basic sits underneath that: the two tools do not just differ in strength. They differ in what kind of thing they can put into the model at all.

The picture: a landscape the answer flows through

Think of the model's options as a landscape of valleys. When a question comes in, it flows downhill and settles into whichever valley pulls hardest. Training, or a prompt example, reshapes that landscape: it deepens some valleys and flattens others, changing where answers tend to settle. The data itself is just water. The instruction is what carves the valleys. Raw facts with no instruction to shape them have nowhere to go.

So when you "teach" a model, you are doing landscaping. You are changing how it leans, not topping up a tank of facts.

The striking result: deepen the wrong valley and the model stops answering

The clearest case is a fine-tuning experiment on a large model. Train it on a short, well-fitting style of answer and it keeps almost all of its general ability (it scores 82% on a broad knowledge test, against 84% before training). Train it the same amount on a verbose, ill-fitting style and it collapses to 8% on the very same test.

What matters here is not how much data you used. It is whether the style of answer you trained fits the model's natural way of responding. The good style and the bad style used the same model, the same amount of data, the same settings. Only the shape of the answer was different, and that alone decided whether the model survived or crashed.

The crash destroys the behaviour, not the knowledge

You would assume an 8% score means the knowledge is gone. It is not. When the collapsed model is scored in a way that bypasses how it writes its answer (just reading which option it secretly rates highest), it gets 78%, almost back to normal. The training did not erase what the model knows. It broke its ability to deliver the answer in the expected format. The model still knows the answer; it just cannot get it out the door any more, because the wrong valley was dug so deep that every question rolls into it.

You cannot pour correctness in through this route. You can only reshape the landscape, and if you reshape it badly, you block the exit.

The doubt example: you install caution, not accuracy

One of the behaviours people most want is for a model to question a faulty assumption. So the team taught models to check premises, both ways. The result: it did not make them any more accurate on hard reasoning questions. What it did was make them more cautious — more likely to say "I don't know" and more likely to flag a false assumption.

That caution is genuinely useful, but only on the right task. If a question has a hidden false assumption, the caution catches it. If the question is perfectly valid and just hard, the same caution makes the model refuse a question it could have answered. So whether teaching caution helps you depends entirely on how many of your questions contain false assumptions. There is even a formula for the break-even point.

There is a twist worth knowing: modern aligned models are already very good at spotting false assumptions (they catch 89–100% of them out of the box). So teaching them more caution mostly cannot improve detection — the room to improve is already used up. The place caution should genuinely help is a raw, un-aligned model that has not yet learned to doubt.

A trap worth remembering: the truncation mirage

Here is a way to fool yourself, one that shows up against earlier numbers. If you cut off a model's answer after a fixed number of words, a model taught to answer briefly finishes inside the limit, while a model that reasons step by step gets cut off before it states its answer, and is scored wrong. So the brief style looks like it boosted accuracy, when all it did was finish in time. An apparent 10-point "lift" turned out to be entirely this effect. The cheap fix is to check how often each version actually committed to an answer, alongside its score.

So what does install facts?

None of this means facts can never be installed. There are dedicated methods that edit a fact straight into the weights, cleanly, without breaking anything, and I show one doing exactly that. The lesson is about using the right tool. Fine-tuning and prompt examples are tools for shaping dispositions. When you push facts through them as if they were a loading hose, you do not load the fact; you damage the model's delivery of what it already knew.

Why this matters

If you build with models, it reframes what training is for. Fine-tuning and prompting are how you shape behaviour and tendencies, not how you teach new facts. Match the tool to the job, and check the style of answer you are training fits the model.

If you study minds, there is a wider hint. Teaching often shapes how a learner leans rather than dropping facts into storage, and pushing too hard against a learner's grain can break the delivery while leaving the knowledge intact.

The cite

Pødenphant Lund, T. (2026). Fine-Tuning and In-Context Learning Install Dispositions, Not Data. Zenodo. https://doi.org/10.5281/zenodo.20562086

Read on Zenodo → · Technical version · Dansk version

Related on this site: