Hybrid search

From llmref.wiki
Hybrid search — Retrieval approach combining dense embeddings and sparse lexical matching to maximize recall in information retrieval tasks.

Overview

Hybrid search integrates two complementary retrieval paradigms: dense embedding-based retrieval and sparse lexical-statistical retrieval (typically semantic search paired with BM25 or TF-IDF matching). While dense embeddings excel at capturing semantic similarity and conceptual relationships, sparse lexical methods maintain strong performance on exact term matching and rare word queries. By combining both signals, hybrid search systems improve overall recall while reducing the likelihood of missing relevant documents that would be omitted by either method alone.

This approach emerged from recognition that neither dense nor sparse methods dominate across all query types and datasets. Dense embeddings may fail to retrieve documents containing exact keyword matches that a user explicitly specified, while sparse methods cannot capture semantic variations or synonyms without explicit expansion. Hybrid search balances these trade-offs by applying both retrievers in parallel, then merging or reranking results according to a fusion strategy.

Hybrid search is commonly employed in retrieval-augmented generation (RAG) systems, particularly when chunking strategy divides source material into discrete passages that must be located and ranked before feeding context to a language model.

How it works

Hybrid search typically follows this workflow:

  1. Parallel retrieval: A query is submitted simultaneously to a dense retriever (usually a vector database storing embeddings) and a sparse retriever (inverted index with BM25 scoring or similar).
  1. Score normalization: Results from both retrievers are normalized to comparable scales, since dense similarity scores and BM25 scores operate in different ranges. Common approaches include min-max normalization or rank-based fusion (reciprocal rank fusion).
  1. Score combination: Individual scores are combined using a weighted sum: hybrid_score = α × dense_score + (1 − α) × sparse_score, where α is a tunable parameter. Alternative fusion strategies include harmonic mean or logical AND/OR operations.
  1. Ranking and filtering: Combined results are sorted by final score, and typically the top K results are returned.
  1. Optional reranking: A reranker model may apply an additional learned ranking step to refine the top candidates before presentation or inclusion in a context window for language model processing.

The weight parameter α is typically tuned empirically on a validation set. Some systems adapt α dynamically based on query characteristics: keyword-heavy queries may increase sparse weight, while open-ended queries may favor dense retrieval.

Distinction from related terms

Term Distinction
Semantic search Semantic search refers specifically to dense embedding-based retrieval. Hybrid search explicitly combines dense semantic search with sparse lexical retrieval; semantic search is a component within the hybrid framework.
Retrieval-augmented generation (RAG) RAG is a broader pipeline architecture that uses any retrieval mechanism to ground language model outputs. Hybrid search is one retrieval strategy that can be integrated into RAG systems, but RAG does not require hybrid search.
Query fan-out Query fan-out distributes a single query across multiple retrieval systems or models in sequence or parallel, but does not necessarily combine their scores. Hybrid search is a specific fan-out pattern that explicitly merges dense and sparse ranking signals.
Reranking Reranking applies a learned model to re-rank candidate results from a base retriever. Hybrid search is an initial retrieval fusion step; reranking typically operates downstream on hybrid results.
Recall optimization Recall optimization is a general goal. Hybrid search is one technique for improving recall by compensating for coverage gaps in either dense or sparse methods alone.

Examples

  • Elasticsearch with vector search: Modern Elasticsearch distributions support both BM25 full-text search and vector similarity queries within a single index. A user query can trigger both a dense vector search on embeddings and a traditional text search, with results merged via configurable scoring functions before reranking.
  • LangChain and LlamaIndex integrations: Both popular RAG frameworks provide out-of-the-box hybrid search components that blend a vector store (e.g., Pinecone, Weaviate, Milvus) with a sparse index (BM25 backend). These are frequently used in RAG pipelines feeding foundation models.
  • Weaviate hybrid search: Weaviate's query API includes a hybrid search operator that combines vector search scores with BM25 scores, allowing users to specify α (the weight of dense vs. sparse components) at query time.

See also

References