Embeddings

Embeddings — Numeric vector representations of text (or other data) that place semantically similar content near each other in a high-dimensional space.

Overview

In the context of language models, embeddings are dense, fixed-length numeric vectors that encode the semantic content of text. Two pieces of text with similar meaning will have embedding vectors that are close in vector space (as measured by cosine similarity or Euclidean distance), even if the surface text differs entirely. This property makes embeddings the enabling technology for semantic search, retrieval-augmented generation, and clustering.

Text embeddings are produced by an embedding model — a neural network that processes input text and outputs a vector. The embedding model is trained on large corpora with an objective that encourages semantic proximity (e.g., contrastive learning on question-answer pairs).

Common embedding dimensions range from 384 to 3072 depending on the model; larger dimensions generally encode richer distinctions but require more storage and compute.

LLM-era usage vs. pre-2022 usage

Dimension	Pre-2022 usage	LLM-era usage
Dominant models	Word2Vec, GloVe, BERT	OpenAI text-embedding-3, Cohere Embed, open models (BGE, E5)
Granularity	Word or sentence level	Chunk / passage level for RAG
Primary use	Semantic similarity, classification	Dense retrieval, vector stores, RAG pipelines
Deployment	Offline NLP pipelines	Always-online vector databases (Pinecone, Weaviate, pgvector)

In RAG pipelines, embeddings are generated for each document chunk at index time and for each query at retrieval time; nearest-neighbor search over the index returns the most semantically relevant chunks.

Embeddings are not model knowledge

A common misconception: embeddings are not a representation of what a language model knows. The generative model and the embedding model are typically separate systems. Embeddings encode semantic relatedness; they do not store factual content — the facts come from the documents the embeddings index.

Key properties

Dimensionality: the length of the vector; higher dimensions typically capture more nuance.
Matryoshka embeddings: a training technique allowing truncation to smaller dimensions with graceful quality degradation.
Context length limit: embedding models have a maximum input length (e.g., 512–8192 tokens); inputs exceeding this are truncated.

References

Anonymous

Search

Embeddings

Namespaces

More

Page actions

Contents

Overview

LLM-era usage vs. pre-2022 usage

Embeddings are not model knowledge

Key properties

See also

References

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Embeddings

Overview

LLM-era usage vs. pre-2022 usage

Embeddings are not model knowledge

Key properties

See also

References

Navigation

Wiki tools

Page tools

Categories