Table of Contents
Fetching ...

TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs

Arshia Hemmat, Philip Torr, Yongqiang Chen, Junchi Yu

TL;DR

TDGNet tackles hallucination detection in diffusion language models by modeling the denoising process as a temporal dynamic graph over token attention. It maintain per-token memory and uses a three-stage pipeline—spatial graph neural message passing, memory updating, and trajectory-aware readout—to detect factuality signals that accumulate across denoising steps. The approach yields consistent AUROC gains over output-based, latent-based, and static-graph baselines on LLaDA-8B and Dream-7B while enabling fine-grained localization of hallucinated spans, all with single-pass inference and modest overhead. The results underscore the importance of temporal reasoning on attention graphs for robust diffusion-based hallucination detection and offer a practical path toward safer deployment of D-LLMs.

Abstract

Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context, but hallucination detection for D-LLMs remains underexplored. Prior detectors developed for auto-regressive LLMs typically rely on single-pass cues and do not directly transfer to diffusion generation, where factuality evidence is distributed across the denoising trajectory and may appear, drift, or be self-corrected over time. We introduce TDGNet, a temporal dynamic graph framework that formulates hallucination detection as learning over evolving token-level attention graphs. At each denoising step, we sparsify the attention graph and update per-token memories via message passing, then apply temporal attention to aggregate trajectory-wide evidence for final prediction. Experiments on LLaDA-8B and Dream-7B across QA benchmarks show consistent AUROC improvements over output-based, latent-based, and static-graph baselines, with single-pass inference and modest overhead. These results highlight the importance of temporal reasoning on attention graphs for robust hallucination detection in diffusion language models.

TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs

TL;DR

TDGNet tackles hallucination detection in diffusion language models by modeling the denoising process as a temporal dynamic graph over token attention. It maintain per-token memory and uses a three-stage pipeline—spatial graph neural message passing, memory updating, and trajectory-aware readout—to detect factuality signals that accumulate across denoising steps. The approach yields consistent AUROC gains over output-based, latent-based, and static-graph baselines on LLaDA-8B and Dream-7B while enabling fine-grained localization of hallucinated spans, all with single-pass inference and modest overhead. The results underscore the importance of temporal reasoning on attention graphs for robust diffusion-based hallucination detection and offer a practical path toward safer deployment of D-LLMs.

Abstract

Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context, but hallucination detection for D-LLMs remains underexplored. Prior detectors developed for auto-regressive LLMs typically rely on single-pass cues and do not directly transfer to diffusion generation, where factuality evidence is distributed across the denoising trajectory and may appear, drift, or be self-corrected over time. We introduce TDGNet, a temporal dynamic graph framework that formulates hallucination detection as learning over evolving token-level attention graphs. At each denoising step, we sparsify the attention graph and update per-token memories via message passing, then apply temporal attention to aggregate trajectory-wide evidence for final prediction. Experiments on LLaDA-8B and Dream-7B across QA benchmarks show consistent AUROC improvements over output-based, latent-based, and static-graph baselines, with single-pass inference and modest overhead. These results highlight the importance of temporal reasoning on attention graphs for robust hallucination detection in diffusion language models.
Paper Structure (83 sections, 13 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 83 sections, 13 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Why Temporal Modeling is Essential. We visualize four diffusion dynamics that confound static baselines: Self-Correction (early noise resolves into fact); Correctness Decay and Semantic Drift (where factual or plausible states degrade into hallucinations); and Persistent Error (where errors lock in early). TDGNet aggregates structural signals across the trajectory sequences to distinguish these evolving patterns, avoiding the pitfalls of single-snapshot detectors.
  • Figure 2: Overview of the TDGNet framework. The model extracts attention maps across diffusion denoising steps to construct a temporal dynamic graph, which is then processed by a Temporal Graph Network to predict hallucination probability.
  • Figure 3: Temporal Consistency Analysis. We compare the inherent validity signal in the data (Blue, "Branching Fate") against the accuracy of static snapshots (Orange, CHARM). Key Insight: While the data contains strong early signals ($>81\%$ consistency), static models fail to capture them (dropping to 49.1% accuracy). This gap mathematically motivates TDGNet: the signal exists in the trajectory, but requires temporal modeling to extract.