(where RAG breaks, and how to diagnose which layer)
1. Parsing → 2. Chunking → 3. Embedding → 4. Retrieval → 5. Generation → 6. Output
(offline) (offline) (offline) (online) (online) (online)
A failure at any layer corrupts everything downstream. A bad chunk poisons the embedding which poisons retrieval which poisons generation which poisons the answer. So you debug bottom-up from the symptom but top-down from suspected root cause.
"The answer is wrong."
↓
Is the source document in the corpus? → No → ingestion problem
↓ Yes
Can a human read the parsed text? → No → parsing problem (Layer 1)
↓ Yes
Does the chunk make standalone sense? → No → chunking problem (Layer 2)
↓ Yes
Is the chunk in top-100 retrieval? → No → embedding (3) or filter (4)
↓ Yes
Is the chunk in top-k passed to LLM? → No → retrieval/ranking problem (4)
↓ Yes
Does a fresh LLM session answer right
given the same chunks + prompt? → No → generation/prompt problem (5)
↓ Yes
Is the answer reaching the user
in usable form? → No → output/UX problem (6)