What is a race?

The whole theory explained with water

I explain almost everything in this theory with water. It may sound silly, but water makes four hard ideas concrete: what a race is, why learning sets in, why your brain changes character under pressure, and how the system arrives at an answer. Here are the pictures, one at a time.

The tank and the pipes

Picture a big tank full of water. Several pipes lead out of it, let us say three. Each pipe is a possibility, what the theory calls a route.

Once the water has flowed in and just settled, so it is no longer too turbulent, it starts to run out through the pipes. At the end of each pipe stands a new, empty tank. The tank that fills up first has won. That route won the race.

That is the whole idea of a race: several possibilities run at the same time, one reaches the finish first, the rest lose.

A race: several routes, one winner source tank routes loses fills first = winner loses
Several routes run at the same time. The tank that fills first has won the race. Afterwards you only experience the winner, not the race itself.

And it does not stop there. The winning tank is itself the start of a new race, with its own pipes and its own receiving tanks. It is a long chain. The tanks are connected every which way, so a tank can take water from several pipes at once. That is how a brain, or a language model, runs: thousands of these little races, hooked together.

Races hang together every which way The winner of one race becomes the input to the next.
Each little tank is its own race. The winner becomes the input to the next, and the tanks share pipes every which way: a tank can be fed by several pipes and pass on to several. A brain or a language model is thousands of these races, coupled together in one network.

Where is the friction?

Friction is the price of settling the race. When one pipe is clearly the biggest and fastest, there is almost no friction: the winner is given in advance. When three pipes are equally good, the system stalls and wavers, and the friction is high.

That is exactly the signal I measure in language models. If the model is sure, the water runs clearly through one pipe. If it is in doubt, several tanks fill at the same rate, and I can read that straight off the model's output. High friction is a hint that the model is about to get it wrong, and that hint can be put to practical use.

The sand and the hose

Now a different picture. Take a hose and point it at some sand.

Hold the hose still in one spot and the water slowly digs a little channel. And here is the point: next time the water runs, it runs more easily that way, because the channel is already there. The system remembers where the water ran last. Physicists call this hysteresis: a system that carries traces of its own history. That is exactly what learning is. You do not learn by filing information away somewhere. You learn by digging channels that the next water runs through more easily.

A wider channel lets more water through Narrow track a little water gets through Wide track (used many times) a lot of water gets through Pay a little extra friction now (dig a wider channel), get less friction later. That is why learning that costs a bit more in the moment sets in deeper.
Move the hose a little from side to side as you dig, and the channel gets wider. That is the important rule: a little extra resistance now means less resistance later. It is exactly what Bjork calls "desirable difficulties".

The wide and the thin overflows

Think of the layers in the brain as a cascade of tanks that spill over into one another. The deep, old layers are wide overflows: they can move a lot of water at once. The higher, precise layers are thin overflows: fine and fast, but with less capacity. Water spills over both all the time, just in different amounts.

When there is not much pressure, the thin overflows cope fine, and you think precisely and with nuance. As the pressure rises, more water arrives than the thin ones can take, and the wide overflows carry more and more of it. That is you under pressure, falling back on the old, coarse habits. And if the pressure gets too big, even the wide ones cannot keep up: the water spills over every edge at once. That is an overload.

Three terraced stone basins like bird-baths. Each basin is filled to the brim, and the water spills over the edge down into the next, lower basin in a continuous cascade.
This is how I picture it: terraced tanks where the water spills over the edge from one down into the next. Some of those overflows are wide (the deep, old layers) and some thin (the precise layers), and under pressure the wide ones carry more and more, until it all spills over.

The landscape the water seeks out

Last picture. Picture a landscape of hills and valleys. Pour water over it, and it runs down and gathers in the low points.

The low points are the solutions, the places the system can land. Some valleys are deep (good solutions), some are shallow (half-good solutions you can get stuck in), and in some places there are flat shelves where the water just sits still. That is how a brain, or a model, finds its answer: it rolls down the landscape to a low point.

A 3D loss landscape: a wavy sheet in three-dimensional space with hills and valleys. The dark, low points are solutions; the deepest is the best. Contour lines are projected onto the floor.
The real picture: a wavy sheet in three dimensions. The low points (the dark valleys) are the solutions the system can land in. The deepest is the best, and a shallow valley is a local low point you can get stuck in. I draw only three dimensions because that is what we can picture. It is not because there are only three: in reality it happens in thousands at once. The mechanics are the same.

That was the pictures

These four pictures are not the theory itself. They are the scaffolding I build it on. But they catch the mechanics: water racing through routes, channels that get wider as you use them, pipes of different sizes that win under different pressure, and a landscape the water seeks out.

If you want to go further, the rest of the site builds on these pictures: