2. Basic RAG Pipeline

Documents → Parse → Chunk → Embed → Store in vector DB (+ metadata)

User query → Embed query → Vector search → Retrieve top-k chunks → 
  Build prompt (query + chunks) → LLM → Answer (+ citations)

Everything I just described is naive RAG. It works for a demo. It fails in production for predictable reasons:

Vector search alone misses exact-keyword matches (acronyms, product codes, names)
Top-k by similarity is not the same as top-k by usefulness
Chunks lose context when split (the sentence "It is not refundable" is useless without knowing what "it" is)
One-shot retrieval can't handle multi-hop questions ("What did the CEO say about the product mentioned in last quarter's earnings?")
No eval loop = no idea if it's working

We'll address each in: