Chunking is the highest-leverage cheap optimization in RAG. Most "our RAG sucks" customer complaints trace back to bad chunks. Reason: the chunk is the unit of retrieval. If your chunks are wrong-shaped, no embedding model, reranker, or LLM can save you.

The fundamental tension

There is no universal right answer. Right chunk size depends on document type and query type.

Overlap — the parameter everyone gets wrong

Overlap means chunks share some tokens at the boundary, so an idea that straddles a chunk break appears in both chunks.

Typical: 10–20% of chunk size. 50–100 tokens for a 512-token chunk. Tune empirically.

Document-type rules of thumb

Document type Chunk strategy Size
Prose articles, books Recursive split 500–800 tokens
Technical docs with headings Structure-aware (by section) section-bounded
API reference, schemas Per-endpoint or per-entity natural unit
Legal contracts Structure-aware + larger chunks 800–1500 tokens
Tables / spreadsheets Per-row or per-logical-group row-bounded
Code Per-function or per-class (AST-based) function-bounded
Chat / conversations Per-turn or per-thread turn-bounded
Slack / ticket logs Per-thread, time-windowed thread-bounded

Eval metrics that matter (preview of the eval section later):