Vector Search & Similarity
Theory
After chunks are embedded, they land in a vector index where retrieval means finding the K vectors closest to the query vector.
Similarity metrics:
| Metric | Measures | Best for |
|---|---|---|
| Cosine | Angle between vectors (ignores magnitude) | Text embeddings — most common |
| Dot product | Cosine × magnitude | Pre-normalized embeddings |
| Euclidean | Straight-line distance | Image / numeric embeddings |
Brute-force search is O(N·D) — scanning every vector across every dimension. At scale this is impractical. Approximate Nearest Neighbor (ANN) trades exactness for speed; results are approximately correct most of the time. Three algorithms: HNSW (layered graph — fast, default in Qdrant), IVF (partitions space into cells), PQ (compresses vectors to save memory).
Vector stores manage indexing and ANN automatically:
| Type | Examples | Trade-off |
|---|---|---|
| Purpose-built | Qdrant, Pinecone, Weaviate | Metadata filters, scaling |
| DB extension | pgvector | Simpler ops, slower at scale |
Top-K — typically 5–20 — controls how many chunks return per query. Too low drops relevant results; too high floods the LLM context. When vector similarity alone misses keyword matches, hybrid search fills the gap.