AI is the engine — you are the pilot

How to get something reliable out of AI instead of something generic

Most people think good AI use is about finding exactly the right wording. It is not. It is about building control around a strong but unreliable engine, so you can trust what comes out. Here are the methods I have gathered over a couple of years of using AI to build software, read research, and write: how to feed the model, how to check it, and how to get it to think in new directions.

What is the biggest misconception about artificial intelligence right now? If you ask me, it is the idea that it is about asking it for a better text. That the whole art is finding exactly the right phrasing so the chatbot spits out something usable. A lot of people believe that, the enthusiasts and the sceptics alike.

But that is not what separates good use from bad. When anyone can ask a chatbot to write something, the interesting question is not whether you use AI, but how you ensure the quality. And that is something you can learn, systematise, and get good at.

I have worked with e-learning and compliance for 25 years, and over the past couple of years I have used AI intensively, to build software, to read research, and to write. Along the way I have gathered a handful of methods that work. This article is them.

The thread running through it is simple: AI is the engine, you are the pilot. The engine is powerful, but it does not steer. The whole point is to build control around the engine, so you can rely on it, in a way that is actually more reliable than most people think.

Where does it go wrong for most people?

Let us start with the mistake, because it is so widespread it deserves a name. You have a piece of content, a report, a case file, some notes. You paste it into a chatbot that knows nothing about the context, and ask it to "make something". And you get back something that sounds smooth but is generic and useless.

That is a waste. And it is worse than a waste if you pass it on to a colleague. Here is a point from my own research worth taking with you: the value of information only appears once the reader can see that it is worth the effort to read. We decide, before we read, whether a document deserves our attention. If your colleague gets a text that smells of unfiltered AI output, they will not invest the attention, however much work went into it.

So the first rule is not an AI rule. It is a courtesy rule: never hand someone a piece of generic AI text and ask them to find the meaning in it. Boil it down yourself first, and you save them the work. Leave it, and you have just moved your own effort onto them.

The rest of the article is about avoiding exactly that, by working into the model, and by controlling what comes out.

How do you get something into the model that is worth working with?

The most important part happens before the model answers at all. Generic in, generic out. So here are the ways I feed it.

I talk instead of typing

This is the most underrated habit I have. I dictate to the model. I talk to it the way I would talk to a person, and let the speech become text.

It sounds trivial, but it changes everything. We speak about three times as fast as we write. More importantly, when you talk, the whole context comes along, all the caveats, asides, and "this connects to…" that you would never bother to type. And it is exactly the context that makes the difference between a generic answer and a sharp one.

This article was actually dictated. I talked at length about what I wanted, and then let the model shape it. It is a completely different way of working than sitting and hammering out a perfect prompt. (Practical note: there are tools that make speech-to-text easy right on your computer. What matters is not the tool, but the habit of talking rather than typing.)

I talk with the content

One of the strongest things you can use AI for is not to get it to produce something, but to ask it about something you have given it. I drop in a long report, a contract, or a research article and put questions to it. "What is the main claim here?" "Where does it contradict itself?" "What would a critic object to?" It is like having a patient conversation partner who has read the whole thing and never tires of follow-up questions.

I let it boil large amounts down

Sometimes you simply have too much to read. Here AI is genuinely strong: it can take a big pile and give you the overview, so you can decide what is worth reading in full.

I use it to translate, including the hard stuff

A concrete example: someone wrote to me that something he had read reminded him of my own research. It was a scientific article, in Russian. In the past it would have been a closed door. With AI I could read it, understand it, and see whether he was right.

Another example: a long PhD thesis on training sound. I had AI set it against my own theory, finding differences, parallels, and explanations. It gave me a perspective I had not seen myself. One important caveat, which I return to: I did not trust it blindly. I used it as a source to look things up and understand, not as an answer key. But as a way into something otherwise out of reach, it is invaluable.

I ask it to match a particular style

Here is a small technique that is a point in itself: you can give the model an example of a text you have written and ask it to match the style. That is how this article came about, written so it resembles my own blog posts. It works because you move the model from "the average of the internet" to "this is how I sound". And that is the whole difference between something generic and something that is yours.

How do you check whether it is right?

Now comes the part most people skip. An AI text can sound completely convincing and still be wrong. The model sometimes invents facts, it hallucinates, and it says so in the same confident voice as everything else. That is why this is the most important section in the article. It is about treating the AI as a capable but unreliable employee whose work you always quality-check.

Build an answer key the model never sees

How do you know whether your AI system is actually right, and not just sounds right? You build an answer key: a set of examples where the answer has been checked by a human. Then you run the AI against the key and measure how often it gets it right. The crucial thing is that the model never sees the key. Otherwise you fool yourself, it "passes the test" because it has seen the answer. This is the difference between hoping it is good enough and knowing it. It turns a vague question ("is it any good?") into a number you can track over time.

Let a different AI judge

A model is a poor judge of its own work, it has the same blind spots in both roles. The fix: let a model from a different vendor assess the result. Different models have different weaknesses, so when two independent ones agree, that is a far stronger signal than one model's self-confidence.

An example from my own work: I do my research with Claude, and then I set ChatGPT and Gemini on it for what I call a hostile review. I ask them to find as many holes in my argument as they possibly can and to shoot the sources down as hard as they can. What still stands once two independent models have done their worst to tear it down is far stronger than anything a single model is happy with.

Get it to say "I don't know"

The most dangerous AI text is the kind that sounds right but is guessed. So I ask it to flag what it does not know, rather than make something up. In one of my systems the model inserts small [adapt] markers where information is missing, instead of inventing it. That is honest. The user knows exactly what they have to fill in, and the system never lies on their behalf. Trust does not come from smooth confidence. It comes from the model daring to say "I don't know".

Let it test, but don't take it at its word

You can even let an AI use a product and report where it gets stuck, a kind of automated usability test. I have done this, and it found a real bug: beginners gave up at a particular spot because the system promised "10-20 seconds" but took up to a minute, so they thought it had frozen. But the same test also found a "bug" that was not there at all, the model thought a button did not work, but that was a technical artefact in the test itself. The lesson is the same all the way through: use the AI's findings as leads, not as verdicts. Check them against reality.

Can you get AI to think in new directions?

So far we have talked about making AI reliable. But can you get it to be creative, to come up with something that is not just the most obvious thing? Yes. And let me be clear here: this is not a gimmick, it is evidence-based. The techniques come from creativity research on humans, and I have tested them on language models myself. In an experiment with 1,760 runs, every one of the research-based techniques beat the "raw" approach on both originality and quality.

The core insight is that a model, like a person, has an obvious first choice. A kind of gravity well it falls into. Creativity is about getting away from that well. The research points to three ways:

The lovely thing is that the same moves a good workshop facilitator uses on a room of people work on a language model. There is no magic in it, just known psychology, translated to a new kind of brain.

When should you not trust it?

Here is a point especially for those who are, rightly, sceptical about handing important work to AI. Because there are tasks where an ordinary chatbot demonstrably cannot deliver what you need.

Take compliance. The usual reflex is to pour the whole rulebook into the system "so we are covered". It does not work. I have shown it in my research: when a rule is buried among many competing rules, it loses, even for the strongest models. In one experiment a model's adherence to a particular rule fell from 100% to 0%, simply because the rule sat alongside a pile of others.

It is not because the model is lazy. It is a basic condition: a brain, artificial or human, cannot hold an unlimited rule set in mind at once and apply it flawlessly. This holds for humans too. It is why NIST, the American standards body, dropped the requirement for complicated passwords that have to be changed all the time, it gave worse security in practice, because people just reused variants of the same one.

So what do you do if you need to be sure? You look it up instead of memorising it. Rather than cramming in all the rules at once, the system pulls up only the one rule relevant right now. That technique is called RAG (the model looks up a source before it answers). When we did that, adherence went back to around 100%.

The point for you as a sceptic: it is true that you cannot trust an ordinary language model 100% for rule work, and you have to design your way out of that, not hope your way out. If anyone promises you flawless compliance from a plain chatbot, you now know it does not hold. The mature approach is to know the limit and build the control around it: look things up, validate what can be checked, and train what has to stick. It is the same point as the rest of the article, put sharply: the AI is the engine, the control is yours.

The one rule that ties it all together

If you take only one thing away, let it be this: be just as critical of AI as you would be of another person.

Think about it. If a colleague handed you a text and said "here, use it with your name on it", would you just send it out without reading it? Of course not. You would read it, check it, fix what you disagreed with, because you were the one named as the sender.

Treat AI exactly like that. Not with more distrust than a person, but not with less either. You must never pass on something you do not stand behind yourself. Not because AI is especially unreliable, but because you are the sender, and the responsibility is yours.

It is actually a freeing way to see it. You do not have to choose between "AI is magic" and "AI is dangerous". You can do what you already do with any other work you put your name on: use what is good, check it thoroughly, and take ownership of the result. AI is a powerful engine. But it is still you who flies.

The main takeaways

Related: How I work (the same methods applied to my own research) · Compliance is behaviour, not information (rule-among-many and RAG) · Learning (why bombarding with information teaches no one anything).