Chunking Strategies

Act 3 · ~4 min

Theory

RAG pipelines index documents as chunks because embedding models have token limits and dense vectors work best over focused passages.

Size trade-off: smaller chunks match queries precisely but risk losing surrounding context; larger chunks carry richer context but dilute the embedding signal. A common starting range is 256–512 tokens — tune against retrieval metrics for your corpus.

Strategy	How it splits	When to prefer
Fixed-size	Every N tokens	Simple documents; fast baseline
Sentence/paragraph	Natural language boundaries	Prose with clear meaning units
Semantic	Embedding-based topic shifts	Long, heterogeneous documents
Recursive	Paragraphs → sentences → characters	Structured docs; balances size and structure

Overlap (10–20% of chunk size) shares tokens at boundaries so a sentence split across chunks appears in both — preserving continuity retrieval would otherwise miss.

Metadata per chunk — source file, position, section title — enables filtered retrieval and source citation in the final answer.

Next: chunks become embeddings in a vector store; vector search determines which surface for a given query.

Application

Chunking Strategies

Theory