the layered failure-mode model

(where RAG breaks, and how to diagnose which layer)

1. Parsing  →  2. Chunking  →  3. Embedding  →  4. Retrieval  →  5. Generation  →  6. Output
   (offline)    (offline)       (offline)        (online)        (online)         (online)

A failure at any layer corrupts everything downstream. A bad chunk poisons the embedding which poisons retrieval which poisons generation which poisons the answer. So you debug bottom-up from the symptom but top-down from suspected root cause.

Layer 1 — Parsing failures
Layer 2 — Chunking failures
Layer 3 — Embedding failures
Layer 4 — Retrieval failures
Layer 5 — Generation failures
Layer 6 — Output / UX failures

The diagnostic flowchart you should be able to draw on a whiteboard

"The answer is wrong."
        ↓
Is the source document in the corpus?  → No → ingestion problem
        ↓ Yes
Can a human read the parsed text?       → No → parsing problem (Layer 1)
        ↓ Yes
Does the chunk make standalone sense?   → No → chunking problem (Layer 2)
        ↓ Yes
Is the chunk in top-100 retrieval?      → No → embedding (3) or filter (4)
        ↓ Yes
Is the chunk in top-k passed to LLM?    → No → retrieval/ranking problem (4)
        ↓ Yes
Does a fresh LLM session answer right
  given the same chunks + prompt?       → No → generation/prompt problem (5)
        ↓ Yes
Is the answer reaching the user
  in usable form?                       → No → output/UX problem (6)