Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models
Kecheng Chen, Ziru Liu, Xijia Tao, Hui Liu, Xinyu Fu, Suiyun Zhang, Dandan Tu, Lingpeng Kong, Rui Liu, Haoliang Li
TL;DR
This work tackles unstable and inefficient inference in diffusion language models by introducing Coherent Contextual Decoding (CCD), which uses a trajectory-based, context-consistency measure grounded in conditional mutual information to rectify suboptimal decoding paths. It combines a marginalized-context target distribution with a sliding-window historical buffer to enable adaptive, per-step budgeting that favors coherent trajectories and accelerates sampling. Empirical results on Dream and LLaDA show substantial speedups (up to 3.48x) and performance gains across math, code, and planning benchmarks, with robust performance under varying hyperparameters. The approach integrates with existing inference optimizations and demonstrates the value of leveraging history-rich context for reliable, efficient diffusion-based generation.
Abstract
Diffusion Language Models (DLMs) have recently achieved significant success due to their any-order generation capabilities. However, existing inference methods typically rely on local, immediate-step metrics such as confidence or entropy which inherently lack a more reliable perspective. This limitation frequently leads to inconsistent sampling trajectories and suboptimal generation quality. To address this, we propose Coherent Contextual Decoding (CCD), a novel inference framework built upon two core innovations. First, CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence, enabling the early rejection of suboptimal paths. We demonstrate that this mechanism is theoretically equivalent to modeling the consistency of historical steps via the conditional mutual information between context and token predictions. Building on this theoretical insight, we further address the inefficiency of conventional uniform decoding budgets. Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step according to our consistency metric. Consequently, our method significantly improves the quality of generation trajectories while accelerating the sampling process. Empirically, our method achieves a simultaneous enhancement in both inference speed and performance across diverse benchmarks on Dream and LLaDA, delivering up to 3.48x speedup alongside 3.91% performance improvement.
