Phase 1: Indexing (offline, batch, runs when docs change)

Documents → Parse → Chunk → Embed → Store in vector DB (+ metadata)

Phase 2: Querying (online, per-request, runs on every user question)

User query → Embed query → Vector search → Retrieve top-k chunks → 
  Build prompt (query + chunks) → LLM → Answer (+ citations)

Indexing

Querying

Everything I just described is naive RAG. It works for a demo. It fails in production for predictable reasons:

We'll address each in:

4. Advanced RAG toolkit