Tree of Thoughts

Tree of Thoughts — Prompt engineering strategy that explores multiple reasoning paths through tree search before selecting the best solution path.

Overview

Tree of Thoughts (ToT) is a prompting strategy that extends chain-of-thought reasoning by enabling large language models to explore multiple branching decision points and reasoning paths simultaneously, rather than following a single linear trajectory. The approach structures problem-solving as a tree traversal, where each node represents an intermediate reasoning state and edges represent possible reasoning steps. The model evaluates multiple candidate continuations at each node, prunes low-quality branches, and selects the most promising path forward before reaching a final solution.

The method addresses a fundamental limitation of linear chain-of-thought prompting: that single-path reasoning may commit early to suboptimal intermediate steps from which recovery is difficult. By maintaining and exploring alternative reasoning branches—similar to breadth-first or depth-first search in traditional algorithms—Tree of Thoughts allows models to backtrack and explore different solution strategies within a single problem-solving episode.

Tree of Thoughts is applied to tasks requiring iterative decision-making, logical decomposition, and evaluation of intermediate states. Common use cases include mathematical problem-solving, code generation, planning tasks, and complex retrieval scenarios where early choices significantly constrain later options. The approach typically requires an evaluator or heuristic to score the promise of different branches, which may be the model itself (through self-evaluation) or an external scoring mechanism.

How it works

Tree of Thoughts operates through the following procedural steps:

State Representation: Each node in the tree represents a partial solution or intermediate reasoning state—typically a sequence of thought steps generated by the model. The root node begins with the original problem statement.

Candidate Generation: At each node, the model generates multiple candidate next steps (children), typically 2–5 continuations. These represent different possible directions the reasoning could take.

Evaluation and Scoring: A scoring or evaluation function ranks the candidate branches according to their estimated promise. This may be implemented as:

Self-evaluation by the model (e.g., "Rate how likely this path solves the problem: 1–10")
A separate reranker or evaluator component
LLM-as-judge criteria applied to intermediate states
Heuristic scoring based on domain constraints

Expansion Strategy: The search strategy determines which branches to explore further. Common approaches include:

Breadth-first expansion (explore all children at one depth before going deeper)
Greedy selection (expand only the highest-scoring branch)
Beam search (maintain top-k branches and expand them in parallel)
Depth-first with backtracking (explore one path fully, then backtrack to explore alternatives)

Termination: Search terminates when any branch reaches a satisfactory solution, a depth limit, or a computational budget. The final solution is extracted from the terminal node of the selected path.

The computational cost of Tree of Thoughts is higher than single-path reasoning because it requires multiple model calls (one per candidate generation) and potentially many branches explored. Practical implementations typically constrain tree width (number of children per node) and depth to maintain tractability.

Distinction from related terms

Term	Distinction
Chain-of-Thought (CoT)	CoT generates a single linear reasoning sequence step-by-step. ToT explores multiple branching paths and explicitly evaluates them, allowing backtracking. CoT is single-path; ToT is multi-path search.
Prompt chaining	Prompt chaining sequences multiple prompts or model calls in a predetermined order to decompose a task. ToT dynamically explores which sequence of steps is best through evaluation and search. Chaining is deterministic; ToT is adaptive.
ReAct	ReAct cycles between reasoning and external action (tool use) in a loop. ToT reasons over multiple hypothetical futures without necessarily taking actions. ReAct interleaves interaction with the environment; ToT explores reasoning space.
Least-to-most prompting	Least-to-most decomposition breaks a problem into sub-problems solved in sequence from simplest to hardest. ToT explores alternative solution paths at the same problem depth. Least-to-most is hierarchical decomposition; ToT is breadth-aware exploration.
Automated evaluation	Automated evaluation assesses the quality of final outputs. ToT uses evaluation as an integral part of search to guide path selection during reasoning. Evaluation is post-hoc in isolation; in ToT it is part of the search control flow.

Examples

Mathematical Problem Solving: The original Tree of Thoughts paper^[1] demonstrated ToT on Game of 24, a mathematical puzzle where four numbers must be combined with arithmetic operators to reach 24. Standard chain-of-thought prompting succeeded on only 4% of test cases; ToT with beam search exploration reached 74% success by exploring alternative arithmetic orderings and operator choices.

Code Generation and Bug Fixing: ToT has been applied to generating and validating code solutions where multiple implementation strategies exist. The model generates candidate code snippets, evaluates them against test cases (using execution as the scoring signal), and backtracks to explore alternative approaches if a candidate fails. This allows iterative refinement without restarting from the problem statement.

Planning and Task Decomposition: In agentic workflows requiring sequential decision-making (e.g., "write an essay outline, then expand each section, then integrate citations"), ToT explores alternative decomposition orders and intermediate plans, using LLM-as-judge to score how well each decomposition satisfies constraints like coherence and relevance.

References

↑ Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv preprint arXiv:2305.10601 (2023).

[yao2023tot-1] Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv preprint arXiv:2305.10601 (2023).

[1]

Anonymous

Search

Tree of Thoughts

Namespaces

More

Page actions

Contents

Overview

How it works

Distinction from related terms

Examples

See also

References

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Tree of Thoughts

Overview

How it works

Distinction from related terms

Examples

See also

References

Navigation

Wiki tools

Page tools

Categories