Taxonomy | Notion

1. Zero-shot
2. Few-shot
1. Chain-of-Thought (CoT)
1. Self-Consistency
1. Tree of Thoughts (ToT)
1. ReAct
1. Reflexion

Prompt caching

Both Gemini and Anthropic support it. If the example block is identical across all 2M requests, you can cache it and pay a fraction of the input cost on cache hits (often ~10–25% of normal input price). This can drop the bill 60–80% with zero accuracy change and one config flag.

dynamic prompting

detect the complexity of the incoming query, and then you decide which technique to use in your prompt itself.

dp methods

The technique ladder, cheapest → most expensive, looks roughly like this:
Each rung up costs more in latency, dollars, complexity, and maintenance.

1. zero-shot → few-shot → few-shot + CoT → RAG (retrieve definitions/examples) 
         → fine-tune → train your own
         
2. prompt caching → smaller model → retrieval few-shot (fewer examples per call) 
         → fine-tune
         
zero-shot direct       →  1x  cost
few-shot               → ~5x 
CoT (1 path)           → ~20x cost (longer output)
self-consistency N=10  → ~200x cost
ReAct                  → ~Nx 
Reflexion (3 tries)    → ~3x of whatever the base trajectory costs
ToT (50 nodes,eval ea) → ~2000–5000x cost
thinking               → variable

zero-shot           — no scaffolding, trust the model
few-shot            — show pattern by example
CoT                 — show your reasoning
self-consistency    — N paths, majority vote (fixes noise)
ToT                 — tree search with evaluation (rarely right in production)
ReAct               — reason + act + observe (the dominant agent pattern)
Reflexion           — self-critique + retry (fixes bias, needs evaluator)

Definition gap → few-shot or system prompt clarification
Knowledge gap → RAG, or fine-tune if the knowledge is stable + huge
Reasoning gap → CoT, self-consistency, or ToT
Format gap → strict output schema, few-shot showing the format, or constrained decoding