Same content, wider track

Paper 4 · Pødenphant Lund (2026g) · Read on Zenodo

I study language models to understand people.In 1994 Robert Bjork pointed out something we take for granted in learning research today: you remember better when the material is presented in several ways, not when it is repeated the same way. This paper finds the same pattern in language models. That matters, because it suggests the Bjork effect is not a human quirk. It is a general mechanism in systems that can learn.

What Bjork said

In 1994 the American psychologist Robert Bjork pointed out that learning works better when it costs a little more. He called the phenomenon desirable difficulties. Put plainly: if you want a student to remember something in the long run, do not make it too easy.

One of the strongest examples is variety. Schmidt & Bjork showed in 1992 that practising in different ways beats repeating the same practice over and over, even when the total number of repetitions is the same. It holds in motor learning (sport, music, craft), and it holds in verbal memory. Rohrer & Pashler reviewed the evidence in 2007 and found the same thing: variety beats repetition.

Why it works has had different answers over the decades. Most explanations point to something specific about humans: attention, motivation, the way the brain consolidates during sleep, the way the hippocampus organises episodic memory. This paper suggests something else.

Language models show the same pattern

If the Bjork effect is about something basic in the way learning works (and not about anything specifically human), then we should be able to see the same effect in a completely different "learning" system. A language model is a good place to look: it learns from examples, it forgets if the track is not strong enough, and we can see directly what it has learned.

I made 25 invented facts in a fictional domain called "Zorbetik" (invented to make sure the model had not seen the material before) and taught a language model them in two ways:

The total number of "studies" was the same. Same model. Same training time. The only difference was how varied the material was when it was presented.

Afterwards I tested the model's memory by asking questions phrased in ways it had not seen during training. That is what a real memory test amounts to: not whether you can repeat what you were told, but whether you can use it in a new setting.

The result:

That is a difference of 56 percentage points. Same content. Same training time. Just wider variety in how it was presented.

Recall on questions the model had not seen before 100% 50% 0% 38% One phrasing 94% Four phrasings Same 25 facts. Same training time. +56 percentage points.
The only difference between the two conditions was how varied the same facts were when presented during training. The test used questions phrased in ways the model had not seen before.

What it means

Finding the same pattern in a language model that Bjork found in humans tells us something important: the Bjork effect is probably not a human quirk. It is a general mechanism in systems that can learn. That is to say the cause lies in how information is turned into lasting memory, whether the "house" is a human brain or a neural network.

It matters in practice. If you design teaching, courses, e-learning, or any kind of communication where you want the receiver to actually remember something afterwards, then the simplest adjustment you can make is this: vary how you present it. Do not repeat the same phrasing. Vary it.

Concretely, that could be:

None of this is new in teaching terms. It is exactly what Bjork has talked about for 30 years. What is new is that we now have a signal that it is not only a human trait. It is about how information becomes a lasting track in a learning system. Any learning system.

Seven other patterns from the same study

Besides the variety experiment, I also looked at seven other conditions that might affect how well the model learned. Four of them showed the same basic pattern: there is a sweet spot between too easy and too hard. Too easy = no learning. Too hard = also no learning. Somewhere in the middle = maximum learning.

It is the same inverted U-curve known from many other fields of learning research:

A sweet spot between too easy and too hard How much is remembered How hard the material is too easy → little learning sweet spot → most learning too hard → little learning
Four of the other seven conditions in the study followed the same curve: a peak in the middle, a fall to either side. It is the shape Yerkes-Dodson, Vygotsky's ZPD and Sweller's cognitive load theory all describe.

That the same curve turns up in language models on four different axes suggests we may be seeing one underlying mechanism at work. The same one behind all the other observations.

What I don't know

This is a small study. Per condition the number of tests was between 4 and 30. That is enough to show direction, but not enough to say precisely how much variety is optimal or where the sweet spot lies for different kinds of material. A version 2 at larger scale is planned.

And even though the finding is consistent across several language-model families, it is still language models. We have good reason to believe the underlying mechanism is the same as in humans, but that is a supposition, not a proof. Settling it for good would require measurements in biological systems that match the ones we can make in a neural network. That is future work.

Read the paper

The full article is freely available on Zenodo (concept DOI 10.5281/zenodo.20059859):

Pødenphant Lund, T. (2026g). Same Content, Wider Track: Empirical Calibration of Friction Theory on LLM Substrate. Zenodo. https://doi.org/10.5281/zenodo.20059859

Read on Zenodo → · Technical version · Dansk version

Related on this site: