Large language model
Overview
A large language model (LLM) is a type of neural language model distinguished by scale: very large parameter counts (billions to hundreds of billions), trained on very large text corpora (hundreds of billions to trillions of tokens), using self-supervised objectives — typically next-token prediction (autoregressive) or masked-token prediction (encoder-only).
The large threshold is not precisely defined and has shifted over time as scale has increased. In practice, models are called LLMs when they exhibit emergent capabilities — task generalization, In-context learning, and chain-of-thought reasoning — that are absent or weak in smaller models trained on similar data.
Foundation model and large language model are often used interchangeably for text-based models; foundation model is the broader term that includes multimodal (image, audio, code) models trained at similar scales.
Key architectural properties
Most modern LLMs are decoder-only transformer architectures (GPT family, LLaMA, Mistral). Key properties:
- Parameters: the number of learned weights; ranges from ~7B (small LLMs) to >400B (frontier models).
- Context window: the maximum token sequence the model can process in one call.
- Tokenizer: the BPE or SentencePiece vocabulary used to convert text to tokens (Tokenization).
- Sampling parameters: temperature, top-p, top-k controlling output diversity.
LLM knowledge and its limits
An LLM's parametric knowledge is the information encoded in its weights during training. It is:
- Static: frozen at training cutoff; the model has no knowledge of events after its cutoff.
- Implicit: facts are not stored as discrete records but as distributed weight patterns.
- Lossy: rare facts are encoded weakly; the model may hallucinate low-frequency information.
RAG addresses the static and lossy properties by supplying external documents at inference time.
| Term | Distinction |
|---|---|
| Foundation model | Broader: includes multimodal models; LLMs are text-dominant foundation models |
| Generative AI | Broader category; includes image/video generators; LLMs are a subset |
| Language model (general) | Includes smaller models (n-gram, BERT-scale); LLMs are the scale-distinguished tier |
| Chatbot | An application; an LLM is the underlying model |
See also
- Context window
- Tokenization
- Hallucination
- Retrieval-augmented generation
- In-context learning
- Fundamentals