Cross-Encoder

A cross-encoder reads the query and a candidate product as a single concatenated input and outputs one relevance score per pair. Because both texts attend to each other token-by-token, it captures fine-grained interactions a bi-encoder’s pre-computed embeddings cannot.

The trade-off is speed. Bi-encoders embed queries and documents independently — once, at index time — making retrieval O(log N) over millions of items. Cross-encoders must run the model once per query×document pair, so they’re only feasible on the top-N (50–500) candidates a faster retriever returns. This is exactly the two-stage pattern modern ecommerce search uses.

Popular models: BGE-reranker, Cohere Rerank, OpenAI’s rerank API, ms-marco MiniLM. For most stores the lift over bi-encoder-only ranking is 5–15% NDCG@10 — meaningful, but it costs 50–200ms of latency per request.

Related terms

Re-ranking

Embedding

Vector Search