You ask the AI to fix a bug in your service. It comes back with a clean, confident answer. You paste it in. It doesn't work, and worse, now there's a second bug. You tell yourself this model is overhyped and go back to Stack Overflow.
But here's the thing. The model isn't broken. Your question was.
If you've ever watched a senior engineer interrogate a junior at code review, you've already seen the skill that matters most when working with AI. Senior engineers don't ask "is this code good?". They ask "what happens when the user is on a slow connection and refreshes mid-form?". Same idea, completely different signal-to-noise ratio. The model is the same junior engineer, except with infinite patience and a much faster typing speed. The work is on you to frame the question.
After enough trial and error, the same four ingredients keep showing up in every prompt that returns something genuinely useful: context, constraints, examples, expected output. Get all four right and you stop wrestling with AI. Skip any one of them and you're back to copying answers that almost compile.
Let's break each one down.
Why Generic Questions Get Generic Answers
Before the four ingredients, a quick reframe. The model is not searching the internet for you. It's predicting the most likely continuation of your prompt, given everything it has seen during training plus whatever context you handed it in this conversation.
Think about what that means in practice. When you write "how do I fix this error", the most likely continuation is the most generic possible explanation, because that's the average of every "how do I fix this error" prompt the model has ever seen. You get the median answer to the median version of your question.
If you want a non-median answer, you have to give a non-median prompt. That's the whole game.
How do I make my API faster?
I have a Node.js Express endpoint that fetches a user's last 50 orders and their line items. It currently does one query for the user, one for orders, and N queries for line items (one per order). It takes 1.2s on average and ~3s on p99. The DB is Postgres 15. Orders has 8M rows, line_items has 60M rows. I cannot change the table schema this quarter. Show me three options ranked by impact, with code for the top one.
The second prompt isn't longer because longer prompts magically work better. It's longer because it answers the four questions the model would otherwise have to guess: what's the situation, what's allowed, what does success look like, and what shape should the answer take. Those are the four ingredients.
Ingredient One: Context
Context is where the question lives.
Without context, the model has to assume the median version of your stack, the median version of your codebase, and the median version of your skill level. That's three coin flips it has to win before it even starts answering.
The most useful context to put in front of the model, in rough priority:
- The exact code the question is about. Not a paraphrase. Paste the function, paste the failing test, paste the actual error. If it's too long, paste the chunk that matters and tell the model what the rest does in one line.
- What you're trying to accomplish, in plain language. Not the immediate sub-task, the user-facing goal. "I'm trying to let users cancel an order within 30 minutes of placing it, but only if it hasn't shipped."
- What you've already tried. Saves the model from offering the answer you already ruled out. "I considered using a cron job that flips status every minute but I don't want a 60-second window where cancellation is silently denied."
- The environment. Language and version, framework and version, deployment target, database, key libraries. Not every detail, just the ones that constrain the answer.
- The team constraints the model can't see. "This codebase is migrating off TypeORM to Drizzle; new code should be Drizzle.". Or "We support IE11 (don't ask).". Or "Senior teammates will review this PR, keep the API surface small."
Here's a side-by-side. Same problem, two levels of context.
My React component re-renders too much. How do I fix it?
I'm working on a React 18 dashboard at work. There's a <TransactionTable> component that takes a `transactions` array prop (about 800 rows) and renders them with virtualization via @tanstack/react-virtual. The parent component re-fetches every 30 seconds and passes a freshly-built array down. I'm seeing the entire table flash even though only 1-2 rows actually changed.
I've already tried wrapping TransactionTable in React.memo, didn't help, because the array reference is new every fetch.
I want to keep the 30-second refresh. I don't want to introduce Redux or Zustand just for this. What's the idiomatic React-18-only way to fix this?
The second prompt locks down React's version (so the model knows it can use useDeferredValue and useSyncExternalStore), tells the model that React.memo isn't the fix (so it doesn't waste two paragraphs suggesting it), rules out a state library (so the answer stays in scope), and pins the symptom precisely (so the model doesn't go off explaining the general theory of re-renders).
You will get back a useful answer to the second prompt almost every time. Almost no one gets a useful answer to the first.
One pattern I keep coming back to is what I call the "new engineer test". Before sending a prompt, ask yourself: if a new engineer joined the team this morning, would I send them this same message to figure out the problem? If the answer is "no, I'd also link them to the design doc, paste the error, and tell them which constraints matter", then your prompt is missing the same things.
Ingredient Two: Constraints
Constraints are the rules of the answer.
If context tells the model where the problem lives, constraints tell it which corner of the solution space is in bounds. Without explicit constraints, the model produces the technically-correct-but-unusable answer. Use Redis. Add a queue. Rewrite it in Rust. All real options, none of them useful when you have to ship a fix by Friday.
The constraints that come up most often:
- Time and effort. "I have 30 minutes before standup." "This needs to ship today; full rewrite is off the table." "Refactor is fine, take your time."
- Allowed technologies. "Pure SQL, no ORM." "Standard library only, no extra dependencies." "Must work in our existing Lambda, no Docker, no native modules."
- Forbidden technologies. "We don't use Redux." "No
eval, even hidden inside template strings." "No raw SQL, we use the query builder." - Performance budgets. "This loop runs 10k times per request, needs to be O(n)." "p99 latency budget is 200ms total, this step gets 50ms." "Memory tight, service runs in 128MB."
- Style and team conventions. "Match the existing pattern in
userService.ts." "We prefer composition over inheritance here." "No new abstractions, this is a one-off." - Audience. "Explain it like the reviewer is the engineer who wrote the original code five years ago." "This goes in a junior onboarding doc." "Senior architect will read this, assume system-design fluency."
Watch what happens when you add constraints to a request.
Write me a rate limiter in Python.
Write me a token-bucket rate limiter in Python 3.11, using only the standard library. It will be used inside a FastAPI dependency to limit each authenticated user to 60 requests per minute. State must be shared across multiple uvicorn workers, so an in-memory dict is not enough, we have a Redis 7 instance available. Optimise for code clarity over micro-performance; we read this code more than we write it. Show the implementation and one pytest test.
The first version asks the model to make ten design decisions on your behalf. The second version locks all ten down, so the model can spend its attention on the algorithm and the test rather than on the architecture.
There's a subtle thing constraints do that's easy to miss. They give the model permission to skip the disclaimer. If you don't tell the model "this is the only stack I can use", a well-tuned model will start with "there are several approaches..." and walk through three of them before getting to one. That's correct behaviour when the question is genuinely open, and pure noise when you already know what you can use.
Ingredient Three: Examples
Examples anchor the abstract.
There is a wide gap between describing a thing in words and showing what it looks like. The model bridges that gap when you describe a thing. It does not need to bridge it when you show one, and showing one is almost always cheaper.
Three places examples earn their weight:
1. The input you have, and the output you want. If you're asking for a transformation, paste a tiny realistic input and the matching realistic output. No more "for example, something like this".
Write a function that normalizes user names, handles spaces, capitalisation, that kind of thing.
Write a JS function `normalizeName(input: string): string` that turns user-entered names into canonical form. Examples:
" john DOE " → "John Doe"
"MARY-jane O'Sullivan" → "Mary-Jane O'Sullivan"
"jean-luc picard" → "Jean-Luc Picard"
"李小龍" → "李小龍"
"" → ""
So: trim, collapse internal whitespace, title-case each word, but preserve hyphens, apostrophes, and non-Latin characters as-is.
The first version leaves you negotiating the apostrophe-and-hyphen behaviour over three follow-up messages. The second version gets it right on the first try because there's nothing left to guess.
2. A "looks like" sample. When you want generated code to match an existing style, paste a small sample of the style. "Match this tone." "Use the same comment style." "Follow the same naming conventions."
// We name route handlers like this:
export async function listOrdersForUser(req: Request, res: Response) {
const userId = req.params.userId;
const orders = await ordersService.listForUser(userId);
res.json({ orders });
}
// Now write the handler for `getOrderByIdForUser` following the same shape.
The model now has a concrete pattern to follow, not a verbal description of one. Way fewer surprises.
3. A counter-example. Sometimes the fastest way to point at what you want is to point at what you don't want.
I want a regex that matches valid US ZIP codes. NOT this one, which I keep seeing online without an end anchor, so a string like "12345-" matches the first five digits and the trailing dash is silently ignored:
^\d{5}(-\d{4})?
I want one that rejects trailing dashes too.
Examples are the cheapest, most underused tool in prompting. If you spend ninety seconds writing two concrete input/output pairs, you usually skip three rounds of follow-up.
Ingredient Four: Expected Output
Expected output is the shape of the answer.
Most prompts don't say what the answer should look like, and so the model picks a shape from its training data. That shape is usually: a few paragraphs of preamble, three or four labelled approaches, a code block, a wrap-up. It's not wrong, but it's not always what you wanted either.
Tell the model the shape, and you get the shape.
The most useful axes to control:
- Format. Code only. Markdown table. JSON matching this schema. Numbered steps. Mermaid diagram. Plain prose, no headings.
- Length. "Under 200 words." "One paragraph." "As long as it needs to be, but no filler."
- Structure. "Start with the answer, then the reasoning." "List three options, then your recommendation, then why." "Just the diff."
- What to include. "Include the tests." "Include error handling for the network failure case." "Include a one-line summary of what each line of the regex does."
- What to leave out. "No introduction." "Don't restate the question." "Skip the disclaimer about consulting a security expert."
This one is the highest-leverage of the four when you're scripting AI into a workflow. If you're calling a model from code, you usually want structured output, JSON matching a schema, or a fenced block in a known language. The model can absolutely do that, but only if you ask.
Analyse this commit message and return ONLY a JSON object matching this schema:
{
"type": "feat" | "fix" | "chore" | "refactor" | "docs" | "test",
"breaking": boolean,
"summary": string // 8 words or fewer
}
Commit message:
"""
Add retry logic to checkout webhook handler, includes breaking change to webhook signature header
"""
Return only the JSON. No commentary, no markdown code fence, no preamble.
The same prompt with no shape instructions returns something like "Sure! Here's my analysis of the commit message: it's a feature, but it also includes a breaking change...", which is unparseable, and is going to crash your pipeline at 2am.
For interactive use, the shape instruction matters less for parseability and more for attention. You're going to read the answer. Make sure the shape is one you can scan fast.
Review my pull request below for security issues only, not style, not performance, not naming. For each issue, give me:
- file:line
- one-sentence description of the risk
- severity: low | medium | high
- a suggested fix in three lines or fewer
If there are no security issues, say "no findings" and stop.
You'll get a tight scannable list. No "let me start by saying this is generally well-written code", no architectural digressions. The model is happy to be brief when you tell it to be.
Putting It Together
Here's a real-shaped before/after. Same problem in both. The "before" is what you'd type at the end of a long day. The "after" is what you'd type if you stopped to think for two minutes first.
my login is slow help
# Context
I have a Go 1.22 HTTP service handling user login. The endpoint is /v1/sessions. It takes 800ms p50 and ~2.5s p99 under normal load. The code is below.
[paste of the handler, ~40 lines]
The bcrypt cost factor is 12, which I think is the main culprit, but I'm not sure. The DB is Postgres 15, single instance, same network, ~3ms ping.
# Constraints
- Cannot lower bcrypt cost, security team set that number after a threat-model review.
- Cannot change the DB schema this sprint.
- I can add caching or a side queue, but I'd rather not introduce Redis just for this if there's a simpler fix.
- Solution must be deployable behind a feature flag.
# Examples
Here is what a successful login response looks like today:
POST /v1/sessions
→ 200 OK
{ "session_token": "...", "expires_at": "2026-05-15T15:30:00Z" }
I want the response shape unchanged.
# Expected output
Give me, in order:
1. A one-sentence diagnosis of what's most likely causing the latency.
2. Up to three remediation options, each with a rough effort estimate (hours) and expected p99 impact.
3. The code for your top recommended option.
4. One sentence on what could go wrong with the recommendation in production.
No general explanation of how bcrypt works. No suggestion to switch to argon2id, I know about it, it's out of scope.
You can feel the difference just reading it. The model now has everything it needs. The "before" version costs you a fifteen-minute back-and-forth before you get anywhere useful. The "after" version returns a usable plan on the first try.

There's nothing magical about this four-part structure. You don't need to literally label your prompts "Context / Constraints / Examples / Expected output" every time, although honestly it doesn't hurt. The point is to feel where each ingredient is missing.
When the model's answer drifts off into "let me first explain what bcrypt is", your expected output wasn't sharp enough. When it suggests Redis even though you ruled it out, your constraints weren't loud enough. When it asks "what's your database?", you skipped context. When it hands back a regex that breaks on apostrophes, you needed examples.
Diagnosing your own prompt the way you'd diagnose a code review is what moves you from intermediate to advanced at this.
The Two-Minute Investment
Here's the part most people skip. Writing a good question takes effort. Not a lot of effort, maybe two minutes on top of however long you would have spent on a one-liner, but more than zero. And there's a real temptation to fire off the one-liner anyway, because typing the longer version feels like you're doing the model's job for it.
You are. That's the trade. The two minutes you spend giving the model context and constraints buys you a useful answer on the first try instead of a five-message back-and-forth. The four pillars aren't a discipline you adopt out of perfectionism, they're how you stop wasting your own time.
If you remember nothing else: when an answer disappoints you, before blaming the model, re-read your prompt and ask yourself which of the four ingredients was missing. Almost always, at least one of them was.
Better questions, better answers. Same model.






