Large language model

Large language model — A neural network trained on large text corpora using self-supervised objectives to model the probability of text sequences, enabling generation, classification, and reasoning across language tasks.

Overview

A large language model (LLM) is a type of neural language model distinguished by scale: very large parameter counts (billions to hundreds of billions), trained on very large text corpora (hundreds of billions to trillions of tokens), using self-supervised objectives — typically next-token prediction (autoregressive) or masked-token prediction (encoder-only).

The large threshold is not precisely defined and has shifted over time as scale has increased. In practice, models are called LLMs when they exhibit emergent capabilities — task generalization, In-context learning, and chain-of-thought reasoning — that are absent or weak in smaller models trained on similar data.

Foundation model and large language model are often used interchangeably for text-based models; foundation model is the broader term that includes multimodal (image, audio, code) models trained at similar scales.

Key architectural properties

Most modern LLMs are decoder-only transformer architectures (GPT family, LLaMA, Mistral). Key properties:

Parameters: the number of learned weights; ranges from ~7B (small LLMs) to >400B (frontier models).
Context window: the maximum token sequence the model can process in one call.
Tokenizer: the BPE or SentencePiece vocabulary used to convert text to tokens (Tokenization).
Sampling parameters: temperature, top-p, top-k controlling output diversity.

LLM knowledge and its limits

An LLM's parametric knowledge is the information encoded in its weights during training. It is:

Static: frozen at training cutoff; the model has no knowledge of events after its cutoff.
Implicit: facts are not stored as discrete records but as distributed weight patterns.
Lossy: rare facts are encoded weakly; the model may hallucinate low-frequency information.

RAG addresses the static and lossy properties by supplying external documents at inference time.

Distinction from related terms

Term	Distinction
Foundation model	Broader: includes multimodal models; LLMs are text-dominant foundation models
Generative AI	Broader category; includes image/video generators; LLMs are a subset
Language model (general)	Includes smaller models (n-gram, BERT-scale); LLMs are the scale-distinguished tier
Chatbot	An application; an LLM is the underlying model

References

Anonymous

Search

Large language model

Namespaces

More

Page actions

Contents

Overview

Key architectural properties

LLM knowledge and its limits

Distinction from related terms

See also

References

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Large language model

Overview

Key architectural properties

LLM knowledge and its limits

Distinction from related terms

See also

References

Navigation

Wiki tools

Page tools

Categories