Query Understanding

The NLP pipeline that classifies, rewrites, and enriches a raw search query before retrieval — extracting intent, entities, attributes, and constraints.

Query understanding sits between the search box and the retrieval layer. Its job is to turn “red dress under $50 size medium” into a structured query: keyword=“dress”, color=red, price<=50, size=M. Done well, it improves recall and lets faceted filters be auto-applied.

Components typically include: tokenization, normalization (lowercasing, stemming, removing stopwords), entity recognition (brands, sizes, colors), intent classification (informational vs transactional vs navigational), and query expansion via synonyms. Modern systems layer an LLM on top for ambiguous queries.

For ecommerce, the most impactful query understanding wins are usually the simplest: detecting brand names so they don’t fall to BM25 (which over-weights them as rare terms), and pulling out attributes that match facets. A query rewriter that turns “womens running shoes nike size 8” into a clean product search beats every fancy ranking model.

Related terms