Faithfulness vs Groundedness

From llmref.wiki
Faithfulness vs Groundedness — Faithfulness measures overall agreement between an answer and its sources; groundedness measures whether each claim is supported by the provided context.

Overview

Faithfulness and groundedness are closely related metrics in the evaluation of retrieval-augmented systems, frequently used as synonyms. A common precise usage treats faithfulness as the degree to which the answer as a whole does not contradict and is entailed by its sources, and groundedness as the property that each individual claim in the answer is supported by the supplied context. The RAGAS evaluation framework, among others, operationalizes faithfulness as the proportion of an answer's claims that can be inferred from the retrieved context.[1]

Because the literature does not fully standardize the two terms, definitions should be stated explicitly when reporting scores.

How it is measured

  • Faithfulness: decompose the answer into claims, then check what fraction are supported by (inferable from) the provided context. A low score indicates unsupported or contradicted statements.
  • Groundedness: assess, per claim, whether the context contains supporting evidence; closely related to claim-level attribution.

Both differ from answer relevance (whether the answer addresses the question) and from context precision/recall (whether retrieval supplied the right documents).

Distinction from related terms

Term Granularity Question answered
Faithfulness Whole answer Does the answer agree with its sources overall?
Groundedness Per claim Is each claim supported by the context?
Attribution Per claim → source Which source supports this claim?
Answer relevance Whole answer Does it address the question?

Faithfulness is not the same as factual correctness: an answer can be faithful to an incorrect source (supported by the context yet false in the world), and a factually true answer can be unfaithful if the context does not support it.

Examples

  • An answer where every sentence traces to the retrieved passage scores high on faithfulness and groundedness.
  • An answer that adds a plausible but unsupported statistic is unfaithful even if the rest is grounded.

See also

References

  1. Es, S. et al. (2023). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." arXiv:2309.15217. https://arxiv.org/abs/2309.15217