Table of Contents
Fetching ...

VeriTrail: Closed-Domain Hallucination Detection with Traceability

Dasha Metropolitansky, Jonathan Larson

TL;DR

This paper tackles closed-domain hallucination in language models, with a focus on multi-step generative processes (MGS). It introduces VeriTrail, a traceable faithfulness detector that reasons over a DAG representation of the entire generation pipeline, combining claim extraction, sub-claim decomposition, evidence selection, and verdict generation, guided by a hyperparameter $q$ that controls iterative verification. Two new datasets, FABLES+ and DiverseSumm+, are created to evaluate traceability across MGS and SGS, with intermediate outputs and human annotations. Empirical results show VeriTrail outperforms natural language inference, retrieval-augmented generation, and long-context baselines in both hard and soft prediction settings, and ablations confirm the contributions of evidence selection and traceability, while also highlighting computational trade-offs controlled by $q$.

Abstract

Even when instructed to adhere to source material, Language Models often generate unsubstantiated content - a phenomenon known as "closed-domain hallucination." This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs' faithfulness for their respective MGS processes. We demonstrate that VeriTrail outperforms baseline methods on both datasets.

VeriTrail: Closed-Domain Hallucination Detection with Traceability

TL;DR

This paper tackles closed-domain hallucination in language models, with a focus on multi-step generative processes (MGS). It introduces VeriTrail, a traceable faithfulness detector that reasons over a DAG representation of the entire generation pipeline, combining claim extraction, sub-claim decomposition, evidence selection, and verdict generation, guided by a hyperparameter that controls iterative verification. Two new datasets, FABLES+ and DiverseSumm+, are created to evaluate traceability across MGS and SGS, with intermediate outputs and human annotations. Empirical results show VeriTrail outperforms natural language inference, retrieval-augmented generation, and long-context baselines in both hard and soft prediction settings, and ablations confirm the contributions of evidence selection and traceability, while also highlighting computational trade-offs controlled by .

Abstract

Even when instructed to adhere to source material, Language Models often generate unsubstantiated content - a phenomenon known as "closed-domain hallucination." This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs' faithfulness for their respective MGS processes. We demonstrate that VeriTrail outperforms baseline methods on both datasets.

Paper Structure

This paper contains 41 sections, 2 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Left: Hierarchical summarization as a DAG. Right: VeriTrail’s verification process. The evidence trail includes Sentence 2 from Node 10, Sentence 16 from Node 8, and Sentence 81 from Node 4. Evidence summaries are not shown.
  • Figure 2: Hard prediction results for all AlignScore configurations on the FABLES+ and DiverseSumm+ datasets. We varied the threshold $\tau$ used to convert entailment probabilities into binary labels and the number of chunk-level probabilities averaged ($k$). Each value shows the performance for a specific ($\tau$, $k$) pair.
  • Figure 3: Hard prediction results for all INFUSE configurations on the FABLES+ and DiverseSumm+ datasets. We varied the threshold $\tau$ used to convert entailment probabilities into binary labels. Dashed lines indicate the best result across all methods from \ref{['tab:hard_prediction_results']}.
  • Figure 4: Hard prediction results for all RAG configurations on the FABLES+ and DiverseSumm+ datasets. We varied the top-$k$ chunks retrieved. Dashed lines indicate the best result across all methods from \ref{['tab:hard_prediction_results']}.