Which prompting trick helps your AI — and when it backfires

The same trick can help a smaller model and hurt a more capable one. The skill is matching the technique to where your model sits.

There are lots of tricks for getting better answers out of a language model: tell it to think step by step, have it check its own work, ask it to spot false assumptions. Most guides present these as simply "good." They aren't. The honest version is more useful:

A trick is only good when it fits the model

A reasoning trick is not good or bad on its own. The same trick helps or hurts depending on how capable your model already is. A nudge that helps a mid-sized model is just noise to a top model, and a wrong nudge can actively mislead it. And the main thing: telling the model what to do is the real lever. Reshuffling the information you give it rarely moves anything on its own.

Telling the model what to do (the real lever)

Say what the facts are for, first

What: before handing over information, tell the model what it's going to do with it.
Helps: mid-sized models on multi-step problems. It sets them up for the work ahead.
Backfires: top models that have already organised the problem; a wrong framing just misleads them.
Example: "You'll use these facts to work out X. [facts] Now: [question]"

Ask it to check the premise

What: tell the model to flag a false assumption instead of playing along, then answer.
Helps: mid-sized models, both at catching false-premise questions and at hard multiple-choice.
Backfires: the most capable models, which already do this. The instruction just gets in the way.

Reasoning routines (pick by the kind of question)

Think step by step (chain-of-thought)

Helps: questions with a distracting detail, or a close call between options.
Hurts: genuinely ambiguous questions. The extra reasoning invents false certainty where "it depends" is the honest answer.

Check your own answer (self-critique)

Helps: questions where the model doesn't actually have enough to go on. It catches made-up answers.
Hurts: close calls. Second-guessing tips over an answer that was uncertain but right.

Assume you're wrong — why? (pre-mortem)

Helps: shaking loose an answer the model knows but didn't say first time.
Cost: it often talks the model out of an answer that was already right. Use it to rescue, not to double-check everything.

Things that backfire

Forcing a long, rigid answer format

Retraining a model to always answer in a fixed multi-part format (answer, then facts, then verification, then conclusion…) breaks bigger models: one large model dropped from 86% to 3% on a general test as we trained it harder on the format. A format that fights how the model naturally writes gets forced onto everything. (Its effect on small models is still being tested.)

A framing that fights the task

Giving a purpose that contradicts the real task ("for a children's story…" on a technical question) hurts bigger models. They notice the conflict and it disrupts the answer. Smaller models just ignore it.

Stacking lots of instructions

Piling on several framing instructions does not add up their benefits. You pay the cost of all of them and get the benefit of none. One good instruction beats three stacked.

Just reshuffling the information

On models of the ChatGPT type, where you hand over all the facts in the task itself, reordering the same facts barely changes the answer. The direction comes from the instruction. But it is not inert for everyone: on other kinds of model and on multi-step tasks the order can help substantially, and it measurably shifts the model's internal state regardless of type.

The bigger picture

All of the above is the easy half: the techniques. The hard half is figuring out where your particular model sits, so you know which technique will help it. That is a whole field of research in itself. The practical takeaway you can use today: don't ask "is this trick good?" Ask "where is my model, and does this trick help there?"

Behaviour design: find the field that blocks — the same match-the-technique-to-the-barrier idea, for human behaviour
Fine-tune or prompt? — when to retrain vs just prompt
Learning — why training wheels help beginners and slow experts, in models and people

Based on the friction-theory series (Tomas Pødenphant Lund, 2026; Paper 4C in preparation). For the numbers and sources, see the technical version.