-
- When the corpus fits in context
-
- When the task is reasoning, not retrieval
-
- When the knowledge is structured, not textual
-
- When freshness requirements are sub-second
-
- When the task is wide synthesis, not point lookup
The hybrid pattern — what to say when the customer needs more than one
- RAG + tool calling: retrieve policy docs (RAG) AND look up the user's account balance (tool) — both feed the prompt
- RAG + SQL: retrieve product description (RAG) AND query inventory levels (SQL)
- RAG + long context: retrieve relevant docs from a 1M-doc corpus (RAG), put them all into a long-context window (no chunking on the retrieved subset)
- Agents over RAG + tools: the agent decides whether each sub-question needs retrieval, SQL, or an API call
The "do you actually need RAG" decision tree
1. Does the answer live in a corpus they own?
↓ No → it's tool-calling or generation, not RAG
↓ Yes
2. Is the corpus larger than the context window?
↓ No → use long context, skip RAG
↓ Yes
3. Is the corpus mostly unstructured text?
↓ No → text-to-SQL or function calling
↓ Yes
4. Does the data change faster than indexing can keep up?
↓ Yes → live API calls + tool use
↓ No
5. Is the typical question point-lookup or wide synthesis?
↓ Wide synthesis → map-reduce, clustering, pre-computed
↓ Point lookup → RAG is the right answer
- The FDE answer to "we want to build a RAG system"