Visual Search

Visual search uses an image embedding model (CLIP, SigLIP, or a fashion-specific variant) to convert both query images and catalog product photos into vectors. Retrieval is then nearest-neighbor in that visual-embedding space. The same vector index you use for semantic text search can serve visual queries too — just index a different embedding.

For ecommerce, visual search shines in apparel, home goods, and any vertical where “show me something that looks like this” is a natural shopper motion. Conversion rates on visual-search-driven sessions tend to be markedly higher than text search because the shopper has already shown what they want.

Practical caveats: indoor lighting and shadows in user-uploaded photos hurt match quality; you generally need to embed multiple angles or color variants per product; and visual + text hybrid (use image for shape, text for category disambiguation) outperforms pure visual on noisy uploads.

Related terms

Vector Search

Embedding

Semantic Search

Hybrid Search