AI that follows the rules — and knows when not to

Two live demos you can try now, and the recipe they are built on

Most attempts to make an AI follow the rules pour the whole rulebook into it and hope it complies. It does not, and that can be measured. Here are two running demos built on the opposite principle: they look up the single rule when it is needed, answer only when the source covers the question, and hold back or refer to a human when it does not. The goal is not an AI that never fails. It is one that knows where its limit is.

Try them

Both run on an open language model (Qwen2.5-7B) and are free to try. They are demonstrations of the research, not finished products.

The one idea: optimal, not perfect

Following a rule is an internal race between routes, like any other action. The rule's route has to win the races it should win and yield the ones it should yield. It can fail in two ways. It can ignore the rule under pressure, so the rule loses a race it should have won. Or it can follow the rule so rigidly that it refuses legitimate things and turns useless. An assistant tuned to "never break a rule" lands in the second ditch. The right place is the middle, where it follows the rule when it should and gives way when it should.

The same holds for holding back. A model that answers everything fabricates confident answers when the source is missing. A model trained to always hold back ends up withholding on what it actually knows. What works is the selective middle: answer what is covered, hold back on what is not. That is what the demos try to hit.

How they are built — the recipe

The architecture is the same in both, and every choice has a reason in the research.

The research behind it

The demos build on a set of findings about how language models work inside:

A full paper on the architecture is in preparation.

Related: Behaviour design: find the field that blocks · Which prompt trick helps your AI · Using AI: you are the pilot
The demos were built by Tomas Lund. If your organisation has a similar task, you are welcome to write to tomas.lund@frictiontheory.org.