Reasoning model
Overview
A reasoning model is a large language model designed to generate and display its intermediate thinking process before producing a final response. Rather than directly outputting an answer, reasoning models break down problems into constituent steps, articulating logical progression, constraint evaluation, and decision-making rationale. This approach increases transparency in model behavior and can improve accuracy on complex tasks by forcing explicit consideration of sub-problems.
Reasoning models emerge from research into chain-of-thought prompting and related techniques that demonstrate improved performance when models "show their work." The architectural design of reasoning models embeds this behavior as a core capability rather than relying solely on prompt engineering. This represents a shift from end-to-end response generation toward intermediate representation of reasoning as a first-class output.
The distinction between reasoning models and standard language models lies in training objectives and output structure. Reasoning models are typically trained with supervisory signals that reward intermediate step quality, not just final answer correctness. This training regime requires access to step-level annotations or synthetic reasoning traces during model development.
How it works
Reasoning models operate through a multi-stage generation process:
1. Problem encoding: The model receives and encodes the input query or problem statement.
2. Reasoning token generation: The model generates a sequence of "reasoning tokens" or explicit thinking steps. These may be demarcated by special tokens (e.g., `<reasoning>`, `</reasoning>`), natural language markers, or structural formatting. The model predicts what intermediate consideration is necessary before proceeding.
3. Answer generation: Following reasoning completion, the model generates the final answer, constrained by or derived from the reasoning sequence.
4. Output formatting: Both the reasoning trace and final answer are returned to the user, with clear demarcation between the two phases.
Training reasoning models typically involves:
- Data preparation: Collecting or generating step-level annotations for training examples. This may include human-written reasoning traces, synthetic traces generated by teacher models, or in-context learning demonstrations.
- Auxiliary loss functions: Training objectives that penalize both reasoning quality and final answer accuracy, rather than final answer alone.
- Inference-time scaling: Allowing extended computation budgets at inference time so the model can generate longer reasoning sequences without token limits constraining the process.
The computational cost of reasoning models is higher than standard models due to the additional tokens generated during the reasoning phase. A reasoning model generating 5,000 reasoning tokens before a 100-token answer incurs 50x more compute for inference than a model producing the answer directly.
| Term | Distinction |
|---|---|
| Chain-of-thought | Chain-of-thought is a prompting technique applied to existing models to encourage multi-step reasoning. Reasoning models bake this capability into their training and inference architecture. Chain-of-thought is a prompt strategy; reasoning models are architecturally optimized systems. |
| ReAct | ReAct (Reasoning + Acting) combines reasoning with external tool use in agentic workflows. Reasoning models focus on producing explicit intermediate thinking; ReAct models additionally execute actions and observe results. ReAct implies an agentic loop; reasoning models may not involve tool use. |
| Foundation model | A foundation model is a large pre-trained model adapted to multiple downstream tasks. A reasoning model is a specific design variant of a foundation model that prioritizes interpretable reasoning steps. All reasoning models are foundation models; not all foundation models are reasoning models. |
| Prompt engineering | Prompt engineering modifies the input text to elicit desired behavior from an existing model. Reasoning models achieve step-by-step reasoning through training and architecture, not prompt manipulation alone. Prompt engineering is an input-level technique; reasoning models are system-level designs. |
Examples
- OpenAI o1 (2024): OpenAI's o1 and o1-preview models explicitly generate reasoning tokens ("thinking") before answering. The reasoning phase is hidden from the user by default but can be inspected, and the model allocates variable compute time (up to 128,000 reasoning tokens) based on problem difficulty.[1]
- DeepSeek-R1 (2024): DeepSeek's R1 model uses a similar approach, generating long reasoning chains (up to 8,000 tokens) followed by concise answers. The model is trained to produce reasoning traces and is optimized for mathematical, coding, and logical reasoning tasks.[2]
- Google Gemini 2.0 with extended thinking (2024): Google's extended thinking mode allows Gemini models to engage in longer reasoning processes before responding, similar to reasoning model behavior, though implemented as a mode rather than a distinct model variant.
See also
- Chain-of-thought – foundational prompting technique for multi-step reasoning
- Prompt engineering – input-level technique to elicit model behavior
- ReAct – reasoning combined with tool use in agentic systems
- In-context learning – learning from examples in the prompt context
- Agentic AI vs AI agent – multi-step autonomous behavior frameworks
- Foundation model – large pre-trained models adapted to multiple tasks
References
- ↑ OpenAI. "Introducing o1, our new reasoning model." November 2024. https://openai.com/o1
- ↑ DeepSeek-AI. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." 2024.