Critic agent

From llmref.wiki
Critic agent — A secondary agent that evaluates and scores outputs from a primary agent to verify quality, correctness, or policy compliance.

Overview

A critic agent is a specialized component in multi-agent systems that functions as a quality control mechanism for outputs generated by primary or execution agents. Rather than performing task execution directly, the critic agent reviews, scores, and validates the work of other agents against defined criteria such as factual accuracy, coherence, safety compliance, or domain-specific requirements. This pattern emerged from research in agentic workflows and LLM-as-judge methodologies, where language models are repurposed to evaluate rather than generate primary content.

The critic agent operates in a subordinate role within an agentic workflow, typically receiving outputs from a generator or task agent and applying structured evaluation. The critic may either provide binary pass/fail signals, numerical quality scores, or detailed diagnostic feedback that can route outputs back to the primary agent for refinement. This design pattern reduces the likelihood of deploying flawed outputs while creating feedback loops that improve overall system performance over time. Critic agents are particularly valuable in safety-critical applications where content filtering alone may be insufficient, and where human review capacity is limited.

Implementation of critic agents involves careful design of evaluation criteria and scoring rubrics. These systems often leverage techniques from automated evaluation, human evaluation, and LLM-as-judge frameworks to establish consistent assessment standards. The critic agent's internal prompting strategy typically specifies what constitutes acceptable quality, which may include hallucination detection, factual consistency checks, citation accuracy, or safety alignment verification.

How it works

A critic agent operates through a multi-stage evaluation pipeline:

  1. Input Reception: The critic receives output artifacts from a primary agent, along with optional context including the original user query, retrieval results, or agent memory state.
  1. Criteria Application: The critic applies a structured evaluation prompt or instructions-as-code specification that defines quality dimensions. For content tasks, this may include faithfulness and groundedness checks; for reasoning tasks, it may verify chain-of-thought coherence.
  1. Evidence Extraction: The critic identifies specific evidence within the output supporting or contradicting each criterion. This may involve retrieval-augmented verification against authoritative sources or knowledge graphs to establish hallucination risk.
  1. Scoring or Ranking: Output is assigned a score (binary, ordinal, or continuous) or ranked against alternatives. Scoring mechanisms often adapt techniques from BLEU, ROUGE, or custom rubrics specific to the evaluation domain.
  1. Feedback Generation: The critic either produces a simple pass/fail signal or generates detailed diagnostic feedback describing specific deficiencies. This feedback may route the output back to the primary agent for iterative refinement, or escalate to human review.

The critic agent itself is typically instantiated as a large language model with specialized fine-tuning, reinforcement learning from human feedback, or in-context learning to calibrate its judgments. Some systems implement ensemble critics—multiple critic agents with different evaluation perspectives—to reduce variance in judgment analogous to improving inter-annotator agreement.

Distinction from related terms

Term Distinction
LLM-as-judge LLM-as-judge is a broad evaluation paradigm using language models to score content; a critic agent is a specific agentic implementation of this paradigm integrated into an orchestrated workflow rather than applied as a standalone evaluator.
Automated evaluation Automated evaluation encompasses all computational scoring methods (metrics like ROUGE or learned rankers); critic agents are a subtype that specifically employ agent-like control flow and state awareness to make iterative evaluation and routing decisions.
Human evaluation Human evaluation relies on expert or crowd annotators; a critic agent automates aspects of this process but typically cannot fully replace human judgment in nuanced or high-stakes domains, and often works in hybrid settings with human review loops.
Reranker A reranker scores and orders existing candidates (e.g., search results); a critic agent evaluates a single generation artifact for absolute quality against intrinsic criteria, and may generate detailed feedback rather than ordinal ranking.
Content filtering Content filtering applies heuristic or shallow learned rules to block prohibited content; critic agents perform deeper, multi-dimensional evaluation of correctness, consistency, and alignment with task-specific quality standards.

Examples

  1. Constitutional AI Critique Loop: Anthropic's Constitutional AI framework employs critic agents that evaluate model responses against a set of constitutional principles. A primary model generates a response, and a critic model scores adherence to principles such as harmlessness and honesty, with the scoring signal feeding back into training. This exemplifies critic agents in the context of safety alignment.
  1. Generative Search Quality Review: In systems supporting AI Overviews or answer generation for generative engines, critic agents verify that generated summaries are grounded in retrieved sources, contain no hallucinated citations, and maintain appropriate citation rates. The critic checks faithfulness to source material and may route low-confidence outputs back to the retriever or query rewriter.
  1. Code Generation Verification: In code generation workflows, a critic agent evaluates generated code for syntax correctness, test-case compatibility, and adherence to acceptable use policies. Tools like GitHub Copilot Workspace employ critic-like mechanisms that verify generated code against linting and test suites, similar to how a critic agent would operate.

See also

References