TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs
Arshia Hemmat, Philip Torr, Yongqiang Chen, Junchi Yu
TL;DR
TDGNet tackles hallucination detection in diffusion language models by modeling the denoising process as a temporal dynamic graph over token attention. It maintain per-token memory and uses a three-stage pipeline—spatial graph neural message passing, memory updating, and trajectory-aware readout—to detect factuality signals that accumulate across denoising steps. The approach yields consistent AUROC gains over output-based, latent-based, and static-graph baselines on LLaDA-8B and Dream-7B while enabling fine-grained localization of hallucinated spans, all with single-pass inference and modest overhead. The results underscore the importance of temporal reasoning on attention graphs for robust diffusion-based hallucination detection and offer a practical path toward safer deployment of D-LLMs.
Abstract
Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context, but hallucination detection for D-LLMs remains underexplored. Prior detectors developed for auto-regressive LLMs typically rely on single-pass cues and do not directly transfer to diffusion generation, where factuality evidence is distributed across the denoising trajectory and may appear, drift, or be self-corrected over time. We introduce TDGNet, a temporal dynamic graph framework that formulates hallucination detection as learning over evolving token-level attention graphs. At each denoising step, we sparsify the attention graph and update per-token memories via message passing, then apply temporal attention to aggregate trajectory-wide evidence for final prediction. Experiments on LLaDA-8B and Dream-7B across QA benchmarks show consistent AUROC improvements over output-based, latent-based, and static-graph baselines, with single-pass inference and modest overhead. These results highlight the importance of temporal reasoning on attention graphs for robust hallucination detection in diffusion language models.
