2. The Decision Tree — When to Fine-Tune

Before any method discussion, internalize this hierarchy. Always go top-down — exhaust cheaper interventions first.

Customer problem
    │
    ├─ Can prompt engineering fix it? ─── YES → stop here
    │       (clearer instructions, format spec, role)
    │   NO ↓
    │
    ├─ Can few-shot examples fix it? ──── YES → stop here
    │       (3-10 in-context examples)
    │   NO ↓
    │
    ├─ Is it a knowledge/fact problem? ── YES → RAG, not fine-tuning
    │   NO ↓
    │
    ├─ Is it style / format / narrow ──── YES → SFT (LoRA/QLoRA)
    │   task / domain language?
    │   NO ↓
    │
    ├─ Is it a "I want it to behave ───── YES → Preference tuning (DPO/RLHF)
    │   more like X, less like Y"
    │   problem?
    │   NO ↓
    │
    └─ Full retraining (almost never the answer)

Strong signals FOR fine-tuning:

Consistent style, tone, or format the model can't reliably hit via prompt
Specialized vocabulary (legal, medical, code, finance jargon)
Narrow repetitive task structure (extract entities from invoices in this exact schema)
You have 100s to 10Ks of high-quality labeled examples
Latency/cost requires a smaller specialized model

Strong signals AGAINST:

"We want the model to know about our products" → RAG, not fine-tuning
Information changes frequently (weekly, monthly) → RAG
< 100 examples → prompt engineering + few-shot
"We want to teach it our policies" → system prompt + RAG, not fine-tuning