Taxonomy

Production prompt structure

Patterns that work

Anti-patterns — what kills production systems

The interview-ready synthesis: if asked "how do you approach prompt engineering in production,"

the structure is — layered prompt with strict schema, eval suite from day one, versioned in git like code, and clear-eyed about when the bug isn't in the prompt at all. That last clause is what separates an FDE answer from a prompt-influencer answer.

Quick reframe for your protocol: I'd suggest tightening the sequence to:

  1. Define the bug (what does "inconsistent" mean, examples please)
  2. Scope (new vs. always, % of traffic, any recent changes — deploys, model version bumps, traffic mix shifts)
  3. Cheapest checks first (sampling params, model version, any A/B test running)
  4. Then the eval/versioning audit (do they even know if it got worse?)
  5. Then prompt structure
  6. Then retrieval / context (is the variance coming from retrieved chunks?)

Notice the principle: cheapest hypothesis tested first. Checking the temperature config is 30 seconds. Auditing their entire prompt structure is 2 hours. An interviewer grading "structured diagnosis" wants to see you order by cost-to-test.