6. When NOT to Use RAG

1. When the corpus fits in context
1. When the task is reasoning, not retrieval
1. When the knowledge is structured, not textual
1. When freshness requirements are sub-second
1. When the task is wide synthesis, not point lookup

The hybrid pattern — what to say when the customer needs more than one

RAG + tool calling: retrieve policy docs (RAG) AND look up the user's account balance (tool) — both feed the prompt
RAG + SQL: retrieve product description (RAG) AND query inventory levels (SQL)
RAG + long context: retrieve relevant docs from a 1M-doc corpus (RAG), put them all into a long-context window (no chunking on the retrieved subset)
Agents over RAG + tools: the agent decides whether each sub-question needs retrieval, SQL, or an API call

The "do you actually need RAG" decision tree

1. Does the answer live in a corpus they own? 
        ↓ No → it's tool-calling or generation, not RAG
        ↓ Yes
2. Is the corpus larger than the context window?
        ↓ No → use long context, skip RAG
        ↓ Yes
3. Is the corpus mostly unstructured text?
        ↓ No → text-to-SQL or function calling
        ↓ Yes
4. Does the data change faster than indexing can keep up?
        ↓ Yes → live API calls + tool use
        ↓ No
5. Is the typical question point-lookup or wide synthesis?
        ↓ Wide synthesis → map-reduce, clustering, pre-computed
        ↓ Point lookup → RAG is the right answer

The FDE answer to "we want to build a RAG system"