The AI sees the opposite illusion to us

Paper 24 · Pødenphant Lund (2026) · Read on Zenodo

A way to tell which of our mind's quirks are forced, and which are just accidents.Show a person a circle ringed by big circles and the centre one looks small. That is the famous Ebbinghaus illusion. Show the same picture to an AI that can see, and it does the opposite: it judges the centre circle as larger. We push a thing away from what surrounds it. The AI pulls it toward. On a whole family of these classic illusions the AI sees the opposite of what we see, and the same flip turns up in three completely different AI designs.

What this is really about

The mind is full of quirks. We misjudge sizes, we fall for framing, we remember the first thing in a list better than the middle. Some of these quirks are deep features of how any thinking thing has to work when it cannot compute everything at once. Others are just accidents of the particular hardware, like the blind spot in your eye where the optic nerve leaves. The trouble is that from the outside the two kinds look the same. With a person you only see the final answer, never the machinery that produced it, so you cannot tell a forced feature from a happy accident.

An AI language model is different. You can read, token by token, how confident it was and what it nearly said instead. That makes the hidden decision visible. The model becomes a measuring tool that can do something behaviour alone cannot: sort a quirk into one of three boxes.

The three boxes

I use friction theory to set out a clear test for which box a quirk belongs in, then run it on something new: visual illusions, using AI models that can see.

The surprise

Classic illusions like the Ebbinghaus and Delboeuf (size) and simultaneous-contrast (brightness) are all about context. A grey square looks darker on a white background than on a black one. People contrast: we judge a thing away from its surroundings, so a square looks darker next to bright things.

The seeing-AI does the reverse. It assimilates: it judges a thing toward its surroundings, so it calls the same square lighter next to bright things. One simple rule explains every case. We push the target away from its context; the AI pulls it toward. Wherever the human illusion happens to point the same way as this pulling-toward, the AI looks like it shares our illusion. Wherever they point apart, it goes the opposite way.

Why that matters so much

There is an obvious objection to any claim that AI shares human quirks: maybe the model just read about our illusions in its training text and is parroting them back. If that were true, the model would copy our direction. It does not. It goes the opposite way, smoothly and predictably, as you dial the surroundings up and down. You do not get the opposite of a thing by copying it. So the model is not echoing what it read. It is running its own contextual computation, and that computation differs from ours.

To make the test fair I first check that each model can judge a genuine size or brightness difference when there is no illusion. Only models that pass that check are used, so a flat result cannot just mean the model could not see.

The same flip in three different machines

The striking part is the consistency. The opposite-direction effect appears in three AI models built on three different vision systems and three different language models. As the context is dialled up step by step, the model's judgment slides further the same way every time, and in every one of the nine tests the effect is statistically solid. A much smaller model in the same family does the same thing, just more gently. One illusion, the Müller-Lyer arrows, does not trigger the effect at all, which is useful: it shows the rule is specific to a thing enclosed by its surroundings, not a blanket "the AI gets everything backwards."

A guess at why

Our eyes are wired for contrast. Neighbouring cells in the retina suppress each other, which sharpens edges and pushes a thing away from its background. That wiring helps an animal recover the true size and shape of something it needs to grab. A seeing-AI has no retina and no such wiring. Its vision system blends nearby patches of the image together by a kind of weighted averaging. Averaging pulls things toward their neighbours, which is exactly assimilation. This is a hypothesis rather than a proven cause, and there are clean follow-up tests to run. Whether the real driver is the averaging itself or a pattern learned from training data is still open.

The cite

Pødenphant Lund, T. (2026). Vision-Language Models Assimilate Where Humans Contrast: A Cross-Architecture Signature of Contextual Computation. Zenodo. https://doi.org/10.5281/zenodo.20678296

Read on Zenodo → · Technical version · Dansk version

Related on this site: