Table of Contents
Fetching ...

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin

TL;DR

The paper tackles the fixed-generation-length bottleneck in Diffusion Large Language Models by introducing DAEDAL, a training-free two-stage approach that dynamically adapts output length per task. It uses an Initial Length Adjustment based on EOS confidence to set a coarse length and Iterative Mask Insertion to insert extra reasoning space during denoising. Empirical results on multiple DLLMs and benchmarks show DAEDAL achieves comparable or superior performance to well-tuned fixed-length baselines while significantly improving efficiency via higher effective token ratios. This work enables per-problem length adaptation, narrowing the gap between diffusion-based and autoregressive generation.

Abstract

Diffusion Large Language Models (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models. DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric. 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion, ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio. By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

TL;DR

The paper tackles the fixed-generation-length bottleneck in Diffusion Large Language Models by introducing DAEDAL, a training-free two-stage approach that dynamically adapts output length per task. It uses an Initial Length Adjustment based on EOS confidence to set a coarse length and Iterative Mask Insertion to insert extra reasoning space during denoising. Empirical results on multiple DLLMs and benchmarks show DAEDAL achieves comparable or superior performance to well-tuned fixed-length baselines while significantly improving efficiency via higher effective token ratios. This work enables per-problem length adaptation, narrowing the gap between diffusion-based and autoregressive generation.

Abstract

Diffusion Large Language Models (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models. DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric. 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion, ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio. By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

Paper Structure

This paper contains 14 sections, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of DAEDAL's effectiveness on LLaDA-Instruct-8B.(a) DAEDAL uses a unified and short initial length, consistently surpassing the baseline, which needs its length meticulously tuned for each benchmark to achieve peak performance. (b) DAEDAL dynamically adjusts length and adaptively expands on a per-problem basis, resulting in a varied distribution of response lengths. In contrast, the baseline is constrained to a fixed length for all problems.
  • Figure 2: Visualization of the DLLM's awareness of length sufficiency. The heatmaps show the difference in average EOS token confidence at the sequence terminus, measured after the first prediction on a fully masked 128-token input. This difference is the result of subtracting the average confidence on length-insufficient problems (those answered correctly only with a much longer sequence) from that on length-sufficient problems (those answered correctly under 128 tokens). The experiment is conducted with LLaDA-Instruct-8B. The predominantly green color (difference $>$ 0) indicates that EOS confidence is higher for length-sufficient problems, validating our core insight.
  • Figure 3: Inference process of Fixed-Length Denoising (Baseline) and DAEDAL.(a) The standard inference process for current DLLMs, which performs iterative denoising on a sequence of a predefined, static length. (b) Our proposed two-stage inference process, which first employs Initial Length Adjustment to determine an appropriate generation length before denoising, followed by Iterative Mask Insertion to expand the sequence on-demand during the denoising process.
  • Figure 4: Distribution of individual Response Lengths ($\boldsymbol{N_{token}}$) on LLaDA-Instruct-8B. The figure compares the distribution of total tokens used per problem by DAEDAL (orange histogram) and the baseline (blue histogram) across four benchmarks. DAEDAL's dynamic, per-problem adaptation results in a varied distribution of lengths. In contrast, the baseline is constrained to a single fixed length for all problems within a benchmark, represented by a single bar in its histogram.
  • Figure 5: Ablation Results on DAEDAL's Thresholds. The two 4x4 heatmaps present a grid search over two interdependent threshold pairs: ($\tau_{high}, \tau_{low}$) and ($\tau_{eos}, \tau_{expand}$). All 32 configurations were evaluated on GSM8K using LLaDA-Instruct-8B. Higher accuracy is indicated by a darker green. The color bar also provides reference color for performance of baseline. Our default settings are in blue boxes. The results demonstrate remarkable stability, with all configurations comparable to the best-performing baseline, and some even outperforming it.