Vector database
Overview
A vector database is a specialized data management system designed to handle the storage and retrieval of high-dimensional vectors at scale. Unlike traditional relational databases that organize data in rows and columns, vector databases are optimized for semantic similarity search, approximate nearest neighbor (ANN) search, and other vector-based operations. This architectural distinction reflects the emergence of embedding-based AI workflows, where unstructured data (text, images, audio) is converted into dense numerical vectors that capture semantic meaning.
Vector databases serve as critical infrastructure for retrieval-augmented generation (RAG) pipelines, agent memory systems, and other LLM-era applications that require fast access to semantically similar content. The core function is to accept a query vector and return the K nearest neighbors in a pre-indexed corpus, measured by distance metrics such as cosine similarity, Euclidean distance, or dot product. This operation must complete within milliseconds to support real-time AI applications, necessitating specialized indexing structures and approximate search algorithms rather than exact nearest neighbor computation.
Vector databases differ fundamentally from document search engines and traditional databases in both data structure and query semantics. They assume that meaning is encoded in vector space proximity, not keyword matching or relational predicates. This assumption enables applications such as semantic grounding for large language models, cross-modal retrieval, and in-context learning scenarios where the most relevant documents must be identified without explicit filtering logic.
How it works
Vector databases operate through a multi-stage pipeline: vectorization, indexing, and query execution.
Vectorization: Input data (documents, images, code snippets) is converted into vectors using a pre-trained embedding model. These vectors are typically 384 to 3072 dimensions in size. The embedding model is selected based on the modality and domain of the data; for example, text embeddings might use models like OpenAI's text-embedding-3-large or open-source alternatives such as all-MiniLM-L6-v2.
Indexing: Raw vectors are indexed using approximate nearest neighbor algorithms to enable sublinear query time. Common indexing strategies include:
- Hierarchical Navigable Small World (HNSW)
- Product Quantization (PQ)
- Locality-Sensitive Hashing (LSH)
- Inverted file systems with product quantization (IVF-PQ)
These structures trade recall accuracy for speed and memory efficiency, making approximate search practical for billion-scale corpora.
Query execution: At query time, an incoming query is vectorized using the same embedding model. The database then searches the index to identify the K vectors with the smallest distance (or highest similarity) to the query vector. Results are typically ranked and returned with distance scores, which can inform retrieval precision and recall metrics.
Metadata and filtering: Modern vector databases support hybrid search by combining vector similarity with scalar filters (e.g., date range, document type). This allows queries to balance semantic relevance with categorical constraints, improving the quality of results fed to RAG systems.
| Term | Distinction |
|---|---|
| Knowledge graph | A knowledge graph explicitly encodes entities and relationships as structured data (subject–predicate–object triples). A vector database encodes meaning implicitly in continuous vector space. Knowledge graphs support logical inference and structured queries; vector databases support semantic similarity. The two are complementary—a knowledge graph can be converted to embeddings and indexed in a vector database. |
| Embeddings | An embedding is the numerical representation itself—a vector produced by an embedding model. A vector database is the storage and retrieval system for those embeddings. The embedding is the data; the vector database is the infrastructure. |
| Traditional relational database with vector extension | Some relational databases (PostgreSQL with pgvector, MySQL with vector functions) add vector search capabilities. These are not true vector databases: they prioritize ACID semantics and structured queries over approximate nearest neighbor performance. Vector databases prioritize vector search speed and scale. |
| Full-text search engine | Full-text search engines (Elasticsearch, Solr) index and retrieve documents based on keyword and boolean logic. Vector databases retrieve based on semantic similarity in embedding space. Full-text search is lexical; vector search is semantic. They address different retrieval problems. |
| Cache or in-memory store | Caches and in-memory stores (Redis, Memcached) prioritize retrieval speed for exact lookups by key. Vector databases prioritize approximate similarity search across a large corpus. They solve different problems at different scales. |
Examples
- Pinecone: A cloud-native vector database service that supports indexing of billions of vectors with sub-100ms latency. Commonly used in production RAG pipelines where documents are embedded and indexed for retrieval when answering LLM queries.
- Weaviate: An open-source vector database that supports hybrid search combining vector similarity and GraphQL-based filtering. Used in agentic workflows where agents must retrieve relevant context from large document corpora to inform tool use and decision-making.
- Chroma: A lightweight, open-source vector database designed for local development and embedding management. Integrated into many LLM frameworks (LangChain, LlamaIndex) to provide simple in-process RAG functionality without requiring a separate database server.
See also
- Embeddings — the numerical vectors that vector databases store and search
- Retrieval-augmented generation — a primary application of vector databases in LLM workflows
- Agent memory vs Context window — vector databases as a mechanism for agent long-term memory
- Retrieval precision and recall — metrics for evaluating vector database search quality
- In-context learning — the downstream task that vector database retrieval enables