Attention as Race-Architecture: attention as the landscape-governed initiation of races
Paper 29 · Pødenphant Lund (2026) · Read on Zenodo
Selective attention is usually described as a filter, a spotlight, or a set of networks. This paper offers a substrate-mechanism reading: attention is the landscape-governed initiation of races. A bounded predictive substrate runs competing prediction-error resolutions; what determines which of them start is the substrate's installed landscape. Attention is not a faculty standing over the races; it is the landscape making certain races start, by allocating read-capacity to them. Commit-order is a downstream readout, not the identity.
| DOI (concept) | 10.5281/zenodo.20703510 |
| Status | v1 live on Zenodo (2026-06-20) |
| Author | Tomas Pødenphant Lund [ORCID] |
TL;DR
Attention has been theorised as a limited resource, as biased competition among stimuli, as the precision-weighting of prediction errors, and as a set of dissociable networks. This paper takes up a prior, mechanistic question: given a bounded substrate that resolves prediction-error by running competing resolutions ("races") to a commit, what determines which resolutions start in the first place, and what is attention at that level?
The single identity claim, held throughout: attention is the landscape-governed initiation of races. The landscape is the substrate's installed prior-structure (in humans the four fields of Behavioural Friction Theory; in a language model the structure laid down by pre-training and fine-tuning); initiation is the act of starting a race, which in a bounded substrate is the same act as allocating read-capacity to it. Commit-order is downstream.
The account is grounded mechanically in the transformer (the attention-pattern softmax as race-initiation, the output softmax as the downstream commit-race) and tested on language-model substrates. The human-vs-LLM difference is then one of initiation, not maintenance: the LLM has essentially only the computed capacity fields and lacks the fast involuntary capture of the value fields, so it starts a narrower set of races.
The identity claim
Three terms carry the claim. A race is a competing prediction-error resolution the substrate runs toward a commit (the architecture developed across this series). The landscape is the substrate's installed prior-structure, what counts for this substrate as a gradient worth resolving: in humans the four-field structure of Behavioural Friction Theory, in a language model the structure laid down by pre-training and fine-tuning. Initiation is the act of starting a race, which in a bounded substrate is the same act as allocating read-capacity to it.
The claim is deliberately narrow. It is not the relabelling claim that attention simply is the race, nor the weaker claim that the race merely underlies a competitive-selection component of attention. Attention is the landscape causing certain races to start. Commit-order, which started race finishes first, is retained in the account as a downstream readout but is not its identity. The language model is used as a clean control: a substrate that runs competitive selection with a reduced, fine-tuning-installed landscape rather than the human four-field one, which lets the paper ask which features of human attention survive a change of landscape.
Mechanism: the two competitions in a transformer
A transformer attention head performs two separable operations (the QK/OV decomposition of mechanistic-interpretability work; Elhage et al. 2021; Olsson et al. 2022). The query–key circuit computes, by a softmax over earlier positions, how much each prior position is read into the current state. That softmax is set by the learned weights, which is to say by the landscape, so it determines which earlier content becomes active, which is which races start. This is the identity claim instantiated: race-initiation is the landscape-weighted read. The model's other competition, the output softmax over the vocabulary whose dispersion is the per-token competing-routes (CR) signal, is the downstream commit-race that reads out which started race finishes first.
Both competitions have established counterparts in the neuroscience of attention and choice, which is what lets the race reading be more than a redescription. The read/initiation competition is a normalisation-by-pool operation, and the canonical neural model of attention is exactly this (Reynolds & Heeger 2009), an implementation of biased competition (Desimone & Duncan 1995). Softmax belongs to the same divisive-normalisation family, and the correspondence between machine self-attention and the biological normalisation account is already recognised (Lindsay 2020), adopted here rather than claimed. The output-softmax-as-accumulator likeness is offered only as an analogy to the leaky competing accumulator (Usher & McClelland 2001), with the disanalogy flagged: a single feed-forward pass samples no new evidence and carries no diffusion noise. The load-bearing correspondence is the QK–normalisation one.
What installs the landscape: the four fields
A race opens when the input carries free-energy gradient against the substrate's current state. Random nonsense, however voluminous, carries no such gradient and opens no race; ambiguous-but-relevant content opens one immediately. Which gradients a landscape privileges is given by the four-field structure of Behavioural Friction Theory (cross-cited, not re-derived): the substrate prioritises candidate races by misclassification cost, and that ordering has the form of four field-specific computations.
- Safety — a threat/approach-avoid computation; its onset is the fast, involuntary capture of attention by change and threat.
- Meaning — selection among candidate directions; its onset is capture by self- and socially-relevant, directionally-significant content.
- Ability — a match of capacity to task-demand; its onset is the pull toward learnable, masterable content.
- Effort — an aggregate processing cost; its onset is a bias away from high-anticipated-cost surfaces.
The two value fields (Safety, Meaning) install from lived experience stored on the substrate's store-and-compute machinery; the two capacity fields (Ability, Effort) are read off the substrate rather than installed by exposure. The parallel to the LLM is exact: architecture and pre-training build a substrate that stores and computes, and fine-tuning installs a value field, which is why a companion constructive test can install a Safety field's gating on an invented nonce domain with no prior valence, while the Ability inverted-U appears in models never trained for it.
Capture, maintenance, and decline as one mechanism
The race reading dissolves the usual capture-versus-maintenance distinction into one mechanism with three observable phases. Capture is race-initiation: the landscape starts a race on a candidate gradient. Maintenance is not a separate force that holds a running race open but the landscape re-initiating an unresolved race until it resolves; a distraction ends the holding precisely because it stops the re-initiation. Curiosity gaps are the engineered case, the open race being the substrate realisation of the information-gap account of curiosity (Loewenstein 1994). Decline is the phase the spotlight metaphor handles least well: a dense surface is read by a cheap anticipatory glance as high-cost before it is engaged and loses the competition for attention before its content is touched. The glance is itself a race, and the anticipation is learned (stored processing-cost becoming avoidance-valence).
Duration across all three phases is governed by resolvability. A trivially-resolved race closes and releases attention (habituation); an overwhelming, unresolvable race repels (overflow); a race in the matched-friction band, open and resolvable but not yet resolved, holds attention through to resolution, the state felt as productive struggle, curiosity, or flow. The optimal curiosity-gap size scales with the audience's prior capacity.
LLM-substrate evidence
The paper reports results of varying strength and flags which is which. The powered, significant anchor is the base-vs-instruct curiosity-gap subtraction.
- Race-on-choice and integration-load (preliminary). On a constant-volume chained-fact instrument where only the number of resolution-hops varies, friction-density rises and accuracy collapses as chain-depth increases, capacity-gated across model families. The surviving signal is the outbound race-length effect (a deeper answer-race loses the commit), instruct-only; the apparent inbound noise effect does not survive its control (scrambled and intact distractors lower the choice identically, so it is generic parse-cost).
- The friction-sign dissociation (human substrate). On the Upworthy Research Archive (91,230 headlines, within-experiment fixed-effects), processing-difficulty friction lowers click-through robustly (surprisal and entropy both negative, p < 0.01); the competing-routes/curiosity side is directionally positive as predicted but not significant once jointly controlled. The robust half is reported; the curiosity half as directional-but-not-significant.
- The curiosity gap, base-vs-instruct subtraction (powered). Across five vendor families (111 gap items each), base models run through the gap, committing to under-determined answers; fine-tuning installs a gap-registration positive in all five families, pooled over 555 paired items at +0.17 (t = 5.6, p < 0.0001). The reading: fine-tuning adds a small, robust hedging overlay, a difference in what the landscape re-flags as unresolved, not a capacity to hold a race open.
The human-vs-LLM dissociation: an initiation account
The results converge on one through-line: the competitive race is substrate-general, and what fine-tuning installs is a regulatory overlay on the landscape. The difference therefore lies in what initiates. Human attention-initiation is driven by four field-specific race-openers that split by timing, recovering the classic stimulus-driven-versus-goal-directed division (Corbetta & Shulman 2002) and giving it a mechanism: the oldest, highest-cost fields produce fast involuntary capture (Safety by threat; Meaning by self-relevance), while Ability and Effort are slow and computed. An LLM has the computed Ability onset (and an emergent-at-best Effort) but lacks the fast involuntary capture of Safety and Meaning, so it starts a narrower set of races.
Crucially this is one mechanism, not two. There is no separate architectural layer at which a feed-forward substrate fails to hold a race open, because nothing in a bounded predictor holds a running race open over time. What looks like a curiosity gap sustained over minutes is the landscape re-initiating the still-unresolved race; maintenance is repeated initiation over a persistent landscape. A loop test confirms there is nothing architecturally special to hold: across single-pass, chain-of-thought, and explicit agent-loop regimes, the capacity to recognise under-determination tracks compute/reasoning, not a loop-versus-feed-forward boundary (gap-minus-resolved hold-rate: chain-of-thought ≈ +0.57 pooled, loop ≈ +0.41, single-pass ≈ +0.26).
The same developmental vocabulary names two timescales rather than a hold-versus-commit split. At inference a substrate assimilates: it runs the current landscape over the input as in-context computation. Accommodation, a change to the landscape itself, is offline: in an LLM it is training-time fine-tuning; in a human the slow reload and consolidation of the fields between episodes. The one human capacity with no inference-time analogue in a frozen-weight LLM is exactly this online accommodation.
Predictions and falsifiers
- P29.1 — on the forced-commit gap, base runs through while fine-tuning adds a small registration. Falsifier: base shows a gap-registration equal to or exceeding instruct.
- P29.2 (tested) — no distinct architectural hold-open capacity that an agent loop confers over a single pass; recognition of under-determination is driven by compute. Falsifier: a loop sustains non-resolution markedly beyond any single pass at matched compute.
- P29.3 — on matched-content headline A/B, processing-difficulty friction lowers and curiosity friction raises click-through. Falsifier: both move it the same direction.
- P29.4 — across field-specific capture probes the LLM shows the computed Ability (and perhaps Effort) onset but not the fast involuntary Safety/Meaning capture; humans show all four. Falsifier: the LLM shows fast involuntary threat- or own-name-capture without an installed field.
- P29.5 (ran in companion P30) — if a field's function is installed by fine-tuning, field-driven race-initiation emerges. On held-out entities, an installed Safety field raised threat→orient from 0.000 to 0.82 (p ≈ 4e-197) and an "ignore it" instruction could not suppress the opened race.
Scope: owned versus cross-cited
The paper owns the identity claim, the two-competition decomposition (race-initiation and commit) and its causal separability, the field-specific timing-split of attention-initiation, the capture/maintenance/decline duration account, the glance-as-race decline, the friction-sign attention principle, the curiosity-gap base-vs-instruct subtraction, the loop-vs-feed-forward test, and the initiation account of the human-vs-LLM difference. It cross-cites, and does not re-derive, the four-field structure and the misclassification-cost ordering (Pødenphant Lund 2026a §2.9; the field-architecture treatment in §8 of the companion), the operational race-opening predicate, the constructive field-install test, and the companion native-task results. It is positioned as a substrate-level theoretical account with one powered empirical anchor and a set of falsifiable predictions that invite the decisive mechanistic and human experiments. It makes no claim about consciousness.
Read the paper
The full paper is on Zenodo (concept DOI 10.5281/zenodo.20703510):
Read on Zenodo → · Plain English version · Dansk version
Related on this site:
- Paper 0 (BFT) — the four-field architecture (§2.9, §8) this account cross-cites for which gradients the landscape privileges.
- Paper 1 (Friction Theory) — the substrate-universal race architecture whose competing-routes signal this paper reads as the commit-race.
- Paper 13 (Operational FT) — race-opening and recursive resolution; the operational predicate for when an input starts a race.
- Paper 16 (Physics of Learning) — matched-friction and the capacity-match inverted-U (§8) that governs which races hold attention to resolution.