Modern search is two-stage: a cheap, broad retriever (BM25, vector ANN) gets the top 100–500 candidates, then a costly re-ranker scores each candidate against the query and produces the final ordering. The re-ranker can be a cross-encoder transformer, a learning-to-rank GBDT, or an LLM with a relevance prompt.
For ecommerce, the re-rank stage is where business signals get injected: in-stock status, margin, recency, conversion rate, brand authority. The retriever stays purely about textual relevance; the re-ranker decides what surfaces.
Latency budget matters: cross-encoder re-rankers add 50–200ms over the top 100 results. If you can’t afford that, use a distilled bi-encoder or a learned linear model — both are sub-10ms and capture most of the lift.