Embedding model

Embedding model — A neural network model that converts text into fixed-dimensional dense vectors that preserve semantic meaning.

Overview

An embedding model is a type of foundation model trained to map textual input into a continuous vector space where semantic similarity is reflected in vector proximity. These models produce dense representations, typically ranging from 384 to 1536 dimensions, that encode meaning in a way suitable for downstream retrieval, clustering, and comparison tasks.

Embedding models are distinct from large language models that generate text. Instead, they are purpose-built for encoding: they consume text and output a single vector per input sequence. This architectural choice makes them efficient for retrieval-augmented generation workflows, where encoded documents and queries must be compared at scale using vector database systems.

The training process typically involves contrastive learning objectives, where the model learns to place semantically similar text pairs close together in vector space and dissimilar pairs far apart. This enables semantic search applications without requiring keyword overlap. Modern embedding models are often trained using instruction-tuning approaches, allowing them to adapt to domain-specific retrieval tasks through fine-tuning or in-context adaptation.

Embedding models form the retrieval component of RAG pipelines, where they encode both queries and document chunks to enable semantic matching before prompt construction. Their performance directly influences retrieval precision and recall in downstream applications.

How it works

Embedding models operate through a two-stage process: tokenization and vector projection.

First, input text is tokenized using a tokenization scheme (typically subword tokenization). The tokens are passed through a transformer encoder backbone, which applies self-attention across token positions. The transformer produces contextual representations for each token.

Second, the model aggregates token-level representations into a single vector. Common aggregation strategies include: mean pooling over all tokens, using the representation of a special token (e.g., [CLS]), or attention-weighted pooling. The aggregated representation is optionally normalized to unit length (L2 normalization), placing all vectors on a unit hypersphere.

The output vector can then be compared to other vectors using distance metrics. Cosine similarity (dot product of normalized vectors) is standard, though Euclidean distance and other metrics are supported by vector database systems.

Training uses contrastive objectives such as InfoNCE loss, where a query vector is pushed close to its positive document vectors and away from negatives. Instruction tuning further improves generalization by fine-tuning on annotated datasets where queries and relevant passages are paired with natural language instructions describing the task (e.g., "Retrieve documents relevant to this question").

Chunking strategies determine how long-form documents are segmented before embedding, affecting both retrieval coverage and computational cost. Contextual retrieval techniques enhance embedding quality by including surrounding context during encoding.

Distinction from related terms

Term	Distinction
Large language model	LLMs generate sequences of tokens autoregressively; embedding models produce a single fixed vector. LLMs are used for generation; embedding models are used for retrieval and comparison.
Reranker	Embedding models perform first-pass retrieval by vector similarity; rerankers score and order an already-retrieved candidate set using a more computationally expensive comparison. They are often used sequentially in RAG pipelines.
BM25	BM25 is a lexical retrieval method based on term frequency and document structure; embedding models perform semantic retrieval by learned vector similarity. BM25 is parameter-free; embedding models require pre-trained weights.
Semantic search	Semantic search is a retrieval task or capability; an embedding model is the technical component that enables semantic search by encoding text into comparable vectors.
Embeddings	"Embeddings" is the general term for vector representations of any data; an "embedding model" is the specific neural network that produces those embeddings.

Examples

OpenAI text-embedding-3-large (2024) produces 3072-dimensional vectors trained on contrastive objectives and instruction tuning. It is widely used in RAG systems and supports retrieval of both short queries and long documents.

Sentence-BERT (SBERT) is an open-source embedding model family based on BERT fine-tuned with contrastive loss on sentence-pair datasets. Variants range from 384 to 768 dimensions and are commonly used in semantic search and clustering applications.

Cohere Embed-English-v3.0 supports sparse and dense vector outputs simultaneously, enabling hybrid search approaches that combine BM25 lexical retrieval with dense semantic retrieval in a single query.

References

Anonymous

Search

Embedding model

Namespaces

More

Page actions

Contents

Overview

How it works

Distinction from related terms

Examples

See also

References

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Embedding model

Overview

How it works

Distinction from related terms

Examples

See also

References

Navigation

Wiki tools

Page tools

Categories