0
Act 1

Foundations

6 / 8

Embeddings

Act 1 · ~5 min

Theory

An embedding is a vector — a list of floats — produced by an encoder model from text. The encoder is trained so semantically similar inputs produce vectors pointing in similar directions. Similarity ranges from −1 (opposite) to +1 (near-identical).

text"puppy"
encoderembed
vector[0.21, -0.07, …]
puppyanimals
dogcos ≈ 0.9
databasetech
servercos ≈ 0.9
puppy ↔ servercos ≈ 0.1
Embedding space (2D simplified) — semantic neighbors cluster, unrelated terms drift apart.
Keyword search
"store data" only matches docs containing those exact words. Misses paraphrases.
Embedding search
Finds docs about databases, caching, indexing — same meaning, different words.
ModelDimBest For
all-MiniLM-L6-v2384Local / offline RAG
nomic-embed-text768Balanced quality
text-embedding-ada-0021536OpenAI-hosted apps