Vector Search & Similarity

Act 3 · ~4 min

Theory

After chunks are embedded, they land in a vector index where retrieval means finding the K vectors closest to the query vector.

Similarity metrics:

Metric	Measures	Best for
Cosine	Angle between vectors (ignores magnitude)	Text embeddings — most common
Dot product	Cosine × magnitude	Pre-normalized embeddings
Euclidean	Straight-line distance	Image / numeric embeddings

Brute-force search is O(N·D) — scanning every vector across every dimension. At scale this is impractical. Approximate Nearest Neighbor (ANN) trades exactness for speed; results are approximately correct most of the time. Three algorithms: HNSW (layered graph — fast, default in Qdrant), IVF (partitions space into cells), PQ (compresses vectors to save memory).

Vector stores manage indexing and ANN automatically:

Type	Examples	Trade-off
Purpose-built	Qdrant, Pinecone, Weaviate	Metadata filters, scaling
DB extension	pgvector	Simpler ops, slower at scale

Top-K — typically 5–20 — controls how many chunks return per query. Too low drops relevant results; too high floods the LLM context. When vector similarity alone misses keyword matches, hybrid search fills the gap.

Application

Vector Search & Similarity

Theory