Table of Contents
Fetching ...

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization

Zicong Cheng, Ruixuan Jia, Jia Li, Guo-Wei Yang, Meng-Hao Guo, Shi-Min Hu

TL;DR

DLLMs are challenged by unknown generation lengths on a fixed canvas, leading to under- or over-generation. The authors propose LR-DLLM, a training-free inference framework that treats length as an explicit variable and uses a log L regularizer to debias confidence across lengths, enabling dynamic length adaptation via a two-stage process (Stage I probing and Stage II refinement). A length--confidence signal, comprising span-level confidence corrected by k log L, guides length selection and refinement, with instance-specific probing to estimate k. Empirical results show substantial improvements across code infilling and cross-language benchmarks, including HumanEval-Infilling and McEval, demonstrating the method’s model-agnostic generality and practicality for unknown-length generation without retraining.

Abstract

Diffusion Large Language Models (DLLMs) are inherently ill-suited for variable-length generation, as their inference is defined on a fixed-length canvas and implicitly assumes a known target length. When the length is unknown, as in realistic completion and infilling, naively comparing confidence across mask lengths becomes systematically biased, leading to under-generation or redundant continuations. In this paper, we show that this failure arises from an intrinsic lengthinduced bias in generation confidence estimates, leaving existing DLLMs without a robust way to determine generation length and making variablelength inference unreliable. To address this issue, we propose LR-DLLM, a length-regularized inference framework for DLLMs that treats generation length as an explicit variable and achieves reliable length determination at inference time. It decouples semantic compatibility from lengthinduced uncertainty through an explicit length regularization that corrects biased confidence estimates. Based on this, LR-DLLM enables dynamic expansion or contraction of the generation span without modifying the underlying DLLM or its training procedure. Experiments show that LRDLLM achieves 51.3% Pass@1 on HumanEvalInfilling under fully unknown lengths (+13.4% vs. DreamOn) and 51.5% average Pass@1 on four-language McEval (+14.3% vs. DreamOn).

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization

TL;DR

DLLMs are challenged by unknown generation lengths on a fixed canvas, leading to under- or over-generation. The authors propose LR-DLLM, a training-free inference framework that treats length as an explicit variable and uses a log L regularizer to debias confidence across lengths, enabling dynamic length adaptation via a two-stage process (Stage I probing and Stage II refinement). A length--confidence signal, comprising span-level confidence corrected by k log L, guides length selection and refinement, with instance-specific probing to estimate k. Empirical results show substantial improvements across code infilling and cross-language benchmarks, including HumanEval-Infilling and McEval, demonstrating the method’s model-agnostic generality and practicality for unknown-length generation without retraining.

Abstract

Diffusion Large Language Models (DLLMs) are inherently ill-suited for variable-length generation, as their inference is defined on a fixed-length canvas and implicitly assumes a known target length. When the length is unknown, as in realistic completion and infilling, naively comparing confidence across mask lengths becomes systematically biased, leading to under-generation or redundant continuations. In this paper, we show that this failure arises from an intrinsic lengthinduced bias in generation confidence estimates, leaving existing DLLMs without a robust way to determine generation length and making variablelength inference unreliable. To address this issue, we propose LR-DLLM, a length-regularized inference framework for DLLMs that treats generation length as an explicit variable and achieves reliable length determination at inference time. It decouples semantic compatibility from lengthinduced uncertainty through an explicit length regularization that corrects biased confidence estimates. Based on this, LR-DLLM enables dynamic expansion or contraction of the generation span without modifying the underlying DLLM or its training procedure. Experiments show that LRDLLM achieves 51.3% Pass@1 on HumanEvalInfilling under fully unknown lengths (+13.4% vs. DreamOn) and 51.5% average Pass@1 on four-language McEval (+14.3% vs. DreamOn).
Paper Structure (62 sections, 1 theorem, 28 equations, 5 figures, 10 tables, 2 algorithms)

This paper contains 62 sections, 1 theorem, 28 equations, 5 figures, 10 tables, 2 algorithms.

Key Result

Lemma 6.1

Within a fixed commitment step (i.e., with $\mathbf{p},\mathbf{s}$ held fixed), the greedy update rule never revisits a previous length. Hence, the length trajectory is monotone and terminates at a local maximum.

Figures (5)

  • Figure 1: Average confidence as a function of mask length L (log-scaled x-axis). The observed trend is well characterized by a logarithmic dependence ($\propto \log L$), while a linear fit provides a noticeably poorer approximation.
  • Figure 2: Ablation of the $\log L$ regularization term in LR-DLLM on HumanEval-Infilling (Pass@1). All methods use the same length adjustment budget with MAX_LENGTH=128.
  • Figure 3: Effect of Stage II (mid-span adjustment) in LR-DLLM on HumanEval-Infilling (Pass@1) for Dream-7B and DreamCoder-7B.
  • Figure 4: Effect of fixed mask length on HumanEval-Infilling. Solid lines denote fixed-length infilling with different mask sizes, while the dashed horizontal line corresponds to LR-DLLM.
  • Figure 5: Per-instance distributions of the overhead ratio $\texttt{Forward Calls}/\texttt{Generated Tokens}$ for three backbones under the same inference and logging protocol. Forward Calls counts model invocations used to evaluate $\mathrm{AVG}_{\mathrm{conf}}(L)$ (and thus $\mathrm{CL}(L)$) during Stage I probing and Stage II greedy local length search, and Generated Tokens is the number of committed tokens produced by Stage II. The right-skewed shapes indicate a stable typical overhead with a small number of hard outliers.

Theorems & Definitions (2)

  • Lemma 6.1: Monotonicity of the length search
  • proof : Proof sketch