BM25

BM25 (Best Match 25) is the workhorse ranking algorithm behind Lucene, Elasticsearch, OpenSearch, and Postgres full-text search. It scores documents by how often query terms appear, dampened by document length, and boosted when those terms are rare across the catalog. The math comes from probabilistic information retrieval research at Robertson and Sparck Jones in the 1990s.

For ecommerce, BM25 still beats vector-only search on exact-match queries (specific SKUs, brand names, model numbers) because lexical signals matter more than semantic similarity for those. It struggles on synonym-rich or natural-language queries (“something to wear to a beach wedding”) — that’s where semantic or hybrid search wins.

Most modern stacks use BM25 as the first-stage retriever and then re-rank the top 50–200 results with a vector or learning-to-rank model. The two complement each other: BM25 provides recall on lexical matches, vectors provide semantic understanding.

Related terms

TF-IDF

Hybrid Search

Re-ranking

Semantic Search