Chunking is the highest-leverage cheap optimization in RAG. Most "our RAG sucks" customer complaints trace back to bad chunks. Reason: the chunk is the unit of retrieval. If your chunks are wrong-shaped, no embedding model, reranker, or LLM can save you.
There is no universal right answer. Right chunk size depends on document type and query type.
Overlap means chunks share some tokens at the boundary, so an idea that straddles a chunk break appears in both chunks.
Typical: 10–20% of chunk size. 50–100 tokens for a 512-token chunk. Tune empirically.
| Document type | Chunk strategy | Size |
|---|---|---|
| Prose articles, books | Recursive split | 500–800 tokens |
| Technical docs with headings | Structure-aware (by section) | section-bounded |
| API reference, schemas | Per-endpoint or per-entity | natural unit |
| Legal contracts | Structure-aware + larger chunks | 800–1500 tokens |
| Tables / spreadsheets | Per-row or per-logical-group | row-bounded |
| Code | Per-function or per-class (AST-based) | function-bounded |
| Chat / conversations | Per-turn or per-thread | turn-bounded |
| Slack / ticket logs | Per-thread, time-windowed | thread-bounded |
Eval metrics that matter (preview of the eval section later):