VeriTrail: Closed-Domain Hallucination Detection with Traceability
Dasha Metropolitansky, Jonathan Larson
TL;DR
This paper tackles closed-domain hallucination in language models, with a focus on multi-step generative processes (MGS). It introduces VeriTrail, a traceable faithfulness detector that reasons over a DAG representation of the entire generation pipeline, combining claim extraction, sub-claim decomposition, evidence selection, and verdict generation, guided by a hyperparameter $q$ that controls iterative verification. Two new datasets, FABLES+ and DiverseSumm+, are created to evaluate traceability across MGS and SGS, with intermediate outputs and human annotations. Empirical results show VeriTrail outperforms natural language inference, retrieval-augmented generation, and long-context baselines in both hard and soft prediction settings, and ablations confirm the contributions of evidence selection and traceability, while also highlighting computational trade-offs controlled by $q$.
Abstract
Even when instructed to adhere to source material, Language Models often generate unsubstantiated content - a phenomenon known as "closed-domain hallucination." This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs' faithfulness for their respective MGS processes. We demonstrate that VeriTrail outperforms baseline methods on both datasets.
