These are the canonical RRK trap scenarios. Memorize the failure mode and the right fix.
Mismatch #1: "We want to fine-tune on our company docs"
- What they want: a model that answers questions about their docs
- What they need: RAG, not fine-tuning
- Why: small-dataset fine-tuning does not reliably encode facts — it biases distributions. Facts live in a retrievable knowledge base. Hallucinations don't go away because you fine-tuned.
- FDE phrasing: "Fine-tuning is great for teaching the model how to do something, but not for teaching it what to know. For knowledge, we want retrieval — that way when your docs update next quarter, we don't have to retrain."
Mismatch #2: "We fine-tuned and it still hallucinates"
- Fine-tuning a few thousand examples doesn't add factual knowledge
- Fix: hybrid — SFT for style/format, RAG for facts
Mismatch #3: "We fine-tuned and now it can't do basic things anymore"
- Catastrophic forgetting from over-tuning on narrow data
- Fixes: use LoRA (preserves base weights), lower learning rate, fewer epochs, mix in general-purpose data, or just accept that this needs a separate adapter not a single tuned model
Mismatch #4: "Fine-tuning will solve our cost problem"
- Maybe — if you can fine-tune a smaller cheaper model (Flash, distilled) to match Pro quality on your narrow task
- Not if you're tuning the same-size model. Then cost is unchanged or worse.
Mismatch #5: "We want to teach the model our brand voice"
- This is a fine-tuning use case — but specifically preference tuning (DPO) with curated good/bad style pairs, not SFT
- Or, often easier: a strong system prompt with 3-5 few-shot examples gets 80% of the way there