Faithfulness

Property evaluation published

Also known as: Factual Consistency, Source Fidelity

Definition

The degree to which generated text accurately reflects and is supported by the source material or context provided. A faithful response contains only claims that can be verified from the given sources, without adding unsupported information. Faithfulness is a key metric for evaluating RAG systems.

What this is NOT

  • Not correctness (faithful to source, source might be wrong)
  • Not completeness (faithful answers can omit information)
  • Not relevance (answer can be faithful but not answer the question)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

A quality measure for RAG outputs: does the generated answer accurately represent what the retrieved documents say, without adding claims not in the sources? Evaluated by faithfulness metrics like RAGAS.

Sources: RAGAS faithfulness metric, RAG evaluation frameworks, Summarization evaluation literature

Examples

  • RAGAS faithfulness score for RAG evaluation
  • NLI-based verification of generated claims
  • Checking each sentence against source documents
  • LLM judge evaluating 'is this claim supported by the context?'

Counterexamples

Things that might seem like Faithfulness but are not:

  • Model adding information not in context (unfaithful)
  • Model contradicting the provided sources
  • Model inventing citations

Relations

  • inTensionWith hallucination (Hallucination is unfaithfulness)
  • overlapsWith grounding (Grounding enables faithfulness)
  • overlapsWith benchmark (Faithfulness is often benchmarked)

Implementations

Tools and frameworks that implement this concept: