Large language model

From llmref.wiki
Large language model — A neural network trained on large text corpora using self-supervised objectives to model the probability of text sequences, enabling generation, classification, and reasoning across language tasks.

Overview

A large language model (LLM) is a type of neural language model distinguished by scale: very large parameter counts (billions to hundreds of billions), trained on very large text corpora (hundreds of billions to trillions of tokens), using self-supervised objectives — typically next-token prediction (autoregressive) or masked-token prediction (encoder-only).

The large threshold is not precisely defined and has shifted over time as scale has increased. In practice, models are called LLMs when they exhibit emergent capabilities — task generalization, In-context learning, and chain-of-thought reasoning — that are absent or weak in smaller models trained on similar data.

Foundation model and large language model are often used interchangeably for text-based models; foundation model is the broader term that includes multimodal (image, audio, code) models trained at similar scales.

Key architectural properties

Most modern LLMs are decoder-only transformer architectures (GPT family, LLaMA, Mistral). Key properties:

  • Parameters: the number of learned weights; ranges from ~7B (small LLMs) to >400B (frontier models).
  • Context window: the maximum token sequence the model can process in one call.
  • Tokenizer: the BPE or SentencePiece vocabulary used to convert text to tokens (Tokenization).
  • Sampling parameters: temperature, top-p, top-k controlling output diversity.

LLM knowledge and its limits

An LLM's parametric knowledge is the information encoded in its weights during training. It is:

  • Static: frozen at training cutoff; the model has no knowledge of events after its cutoff.
  • Implicit: facts are not stored as discrete records but as distributed weight patterns.
  • Lossy: rare facts are encoded weakly; the model may hallucinate low-frequency information.

RAG addresses the static and lossy properties by supplying external documents at inference time.

Distinction from related terms

Term Distinction
Foundation model Broader: includes multimodal models; LLMs are text-dominant foundation models
Generative AI Broader category; includes image/video generators; LLMs are a subset
Language model (general) Includes smaller models (n-gram, BERT-scale); LLMs are the scale-distinguished tier
Chatbot An application; an LLM is the underlying model

See also

References