Embeddings

From llmref.wiki
Embeddings — Numeric vector representations of text (or other data) that place semantically similar content near each other in a high-dimensional space.

Overview

In the context of language models, embeddings are dense, fixed-length numeric vectors that encode the semantic content of text. Two pieces of text with similar meaning will have embedding vectors that are close in vector space (as measured by cosine similarity or Euclidean distance), even if the surface text differs entirely. This property makes embeddings the enabling technology for semantic search, retrieval-augmented generation, and clustering.

Text embeddings are produced by an embedding model — a neural network that processes input text and outputs a vector. The embedding model is trained on large corpora with an objective that encourages semantic proximity (e.g., contrastive learning on question-answer pairs).

Common embedding dimensions range from 384 to 3072 depending on the model; larger dimensions generally encode richer distinctions but require more storage and compute.

LLM-era usage vs. pre-2022 usage

Dimension Pre-2022 usage LLM-era usage
Dominant models Word2Vec, GloVe, BERT OpenAI text-embedding-3, Cohere Embed, open models (BGE, E5)
Granularity Word or sentence level Chunk / passage level for RAG
Primary use Semantic similarity, classification Dense retrieval, vector stores, RAG pipelines
Deployment Offline NLP pipelines Always-online vector databases (Pinecone, Weaviate, pgvector)

In RAG pipelines, embeddings are generated for each document chunk at index time and for each query at retrieval time; nearest-neighbor search over the index returns the most semantically relevant chunks.

Embeddings are not model knowledge

A common misconception: embeddings are not a representation of what a language model knows. The generative model and the embedding model are typically separate systems. Embeddings encode semantic relatedness; they do not store factual content — the facts come from the documents the embeddings index.

Key properties

  • Dimensionality: the length of the vector; higher dimensions typically capture more nuance.
  • Matryoshka embeddings: a training technique allowing truncation to smaller dimensions with graceful quality degradation.
  • Context length limit: embedding models have a maximum input length (e.g., 512–8192 tokens); inputs exceeding this are truncated.

See also

References