TF-IDF

Term Frequency × Inverse Document Frequency — the classical weighting scheme that scores a term high when it appears often in a document but rarely across the corpus.

TF-IDF is the foundation that BM25 builds on. The intuition: a word is meaningful for ranking when it shows up a lot in this product but is rare across the whole catalog. “Lightweight” mentioned five times in a jacket description matters more than “the” mentioned five times.

It’s rarely used directly anymore — BM25 fixes TF-IDF’s biggest weakness (no diminishing returns on term frequency, no document-length normalization) — but the underlying idea (rarity matters) is everywhere in retrieval.

For SEO and content audits, TF-IDF is still used to compare your product copy against top-ranking competitors and find topical gaps. As a search ranker, it’s mostly historical.

Related terms