Production prompt structure

Treat a prompt as a stack of distinct slots, not one blob. The canonical layering, top to bottom:

System message — role, persona, hard constraints, the output contract. Stable across requests, versioned in your repo. This is where you say "You are a financial Q&A assistant. Never speculate on future prices. Cite document IDs for every claim." Modern models (Gemini, Claude, GPT-4) genuinely attend differently to the system role; it's not just convention.

Few-shot examples — 2–5 worked input/output pairs. This is the highest-leverage slot. Demonstration beats description. A 3-line example of the JSON shape you want outperforms a 30-line description of the JSON shape you want. Pick examples that cover your edge cases (empty inputs, ambiguous inputs, the "I don't know" case).

Context — retrieved chunks, tool results, conversation history, user metadata. Dynamic. Wrap each chunk in delimiters (<doc id="123">...</doc> style) for two reasons: the model parses boundaries better, and it's a partial defense against prompt injection (instructions inside a retrieved doc are visibly inside a doc tag).

Query — the actual user input. Also delimited, also untrusted.

Format reminder — the last thing before generation. "Respond only with JSON matching the schema above. No prose, no markdown fences." This exists because of recency bias in autoregressive generation: tokens closest to the generation point have outsized influence. Long contexts in particular suffer from "lost in the middle" — instructions buried 8K tokens up get forgotten. The format reminder is your insurance policy.

For interview soundbite: "I structure prompts as system / few-shot / context / query / format reminder, with the reminder at the bottom to combat lost-in-the-middle in long contexts."