Dense retrieval
Overview
Dense retrieval is a retrieval approach that represents documents and queries as high-dimensional vectors—called embeddings—and identifies relevant results by computing semantic similarity in vector space. Unlike sparse retrieval methods such as BM25, which rely on exact term overlap, dense retrieval captures semantic meaning and can match documents with different vocabulary but related concepts.
The method relies on an embedding model to convert text into fixed-size numerical vectors, typically 384 to 1536 dimensions depending on the model. A query is encoded into the same vector space, and a nearest-neighbor search algorithm (such as approximate nearest neighbor search) identifies the most similar document vectors by computing distance metrics like cosine similarity or Euclidean distance. This approach is particularly suited to semantic search tasks where relevance depends on conceptual alignment rather than lexical overlap.
Dense retrieval has become a standard component of RAG systems and LLM applications, often combined with hybrid search to balance precision and recall. The approach assumes that meaningful semantic information can be captured in learned vector representations, though the quality depends critically on the choice and training of the embedding model.
How it works
Dense retrieval operates through three main stages:
Encoding: An embedding model—often a pre-trained transformer-based encoder such as BERT or a specialized dense retriever—encodes each document and the user query into dense vectors. The model is trained (or fine-tuned) to place semantically similar texts near each other in vector space.
Indexing: Document vectors are stored in a vector database or approximate nearest-neighbor index (such as HNSW, IVF, or LSH) to enable efficient retrieval at scale. This index structure allows sub-linear query time instead of computing distance to every document.
Retrieval: When a query arrives, it is encoded using the same embedding model. The index returns the top-k documents with the highest similarity scores (typically cosine similarity). These candidates are then ranked, re-ranked using a reranker, or directly passed to the LLM depending on the system design.
The effectiveness of dense retrieval depends on the quality of the embedding model, the choice of similarity metric, and the chunking strategy applied to documents. Dense retrievers can be further optimized through techniques such as gold-relevance distillation, fine-tuning on domain-specific corpora, or in-context learning through examples.
| Term | Distinction |
|---|---|
| BM25 | BM25 is a sparse, keyword-based retrieval method that ranks documents by term frequency and inverse document frequency. Dense retrieval uses learned vector representations to capture semantic similarity. BM25 excels with exact phrase matching; dense retrieval handles synonyms and semantic shifts. |
| Sparse Retrieval | Sparse methods (including BM25) explicitly represent documents as high-dimensional vectors with mostly zero values, one dimension per vocabulary term. Dense retrieval uses learned, low-dimensional continuous vectors that encode semantic information learned during model training. |
| Hybrid Search | Hybrid search combines dense and sparse retrieval methods (typically embeddings plus BM25) to improve both recall and precision. Dense retrieval alone is a single-method approach; hybrid search is an ensemble strategy. |
| Semantic Search | Semantic search is a broader concept describing any retrieval method that aims to match meaning rather than keywords. Dense retrieval is one implementation of semantic search; others include knowledge graph traversal or explicit semantic models. |
| Contextual Retrieval | Contextual retrieval modifies queries or documents based on conversation history or broader context before retrieval. Dense retrieval is the underlying retrieval mechanism; contextual retrieval enhances the input to that mechanism via query rewriting or similar techniques. |
Examples
DPR (Dense Passage Retrieval): Introduced by Facebook AI in 2020, DPR fine-tunes a BERT-based retriever on question-passage pairs to rank Wikipedia passages for open-domain question answering. It significantly outperformed BM25 on standard benchmarks by learning to recognize semantic relevance beyond keyword overlap.
ColBERT (Contextualized Late Interaction over BERT): A dense retrieval method that represents queries and documents as sets of token-level embeddings and performs late interaction—computing similarity between query and document tokens at retrieval time. ColBERT achieves state-of-the-art performance on TREC and MS MARCO benchmarks while maintaining fast retrieval speed.
Vector Database Applications: Production RAG systems (such as those built with Langchain or LlamaIndex) routinely use dense retrievers with vector databases like Pinecone, Weaviate, or Milvus to retrieve relevant document chunks before passing them to an LLM for hallucination reduction and answer generation.
See also
- Embeddings — the numerical representations used in dense retrieval
- Embedding model — the model that produces dense vectors
- Vector database — infrastructure for storing and querying dense vectors at scale
- Retrieval-augmented generation — the broader RAG framework that often uses dense retrieval
- Hybrid search — combination of dense and sparse retrieval methods
- Reranker — post-retrieval re-ranking to refine dense retrieval results
- Semantic search — broader concept of meaning-based retrieval