Cosine Similarity

A distance metric that measures the angle between two vectors, ignoring their magnitude — the default similarity score for embedding-based search.

Cosine similarity returns a value in [-1, 1] (or [0, 1] for normalized embeddings) that measures how aligned two vectors point, regardless of length. For text embeddings, magnitude is mostly noise, so cosine focuses on what matters: direction in the meaning-space.

Most embedding models output L2-normalized vectors — once that’s done, cosine similarity is mathematically equivalent to (1 − Euclidean distance² ÷ 2). This lets vector indexes use Euclidean nearest-neighbor structures (HNSW, FAISS) and recover cosine ranking for free.

Practical thresholds vary by model and corpus, but good ecommerce embeddings cluster near-matches at cosine ≥ 0.7 and exact matches at ≥ 0.85. Calibrate per-store; the absolute number is meaningless without context.

Related terms