Table of Contents
Fetching ...

Conditional [MASK] Discrete Diffusion Language Model

Hyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin Jung

TL;DR

This work presents Diffusion-EAGS, a framework that fuses conditional masked language models with discrete diffusion models via a conditional Markov Random Field to achieve high-quality, diverse, and controllable text generation. It introduces two mechanisms—Entropy-Adaptive Gibbs Sampling (EAGS) for stepwise, uncertainty-driven updates, and Entropy-based Noise Scheduling (ENS) for structured denoising during training—alongside an energy-based interpretation to guarantee progressive energy reduction. Through experiments on RocStories and Paradetox against ARMs, CMLMs, and DDLMs, the approach yields superior quality-diversity tradeoffs and demonstrates robust keyword-based control. The findings suggest that integrating MLMs into diffusion frameworks, guided by entropy-aware strategies, can mitigate degeneration in conditional generation and offer practical benefits for controllable NLP applications. Limitations include exploration with other PLMs and extending to tasks beyond generation; future work could adapt the framework to encoder-decoder PLMs and broader NLP tasks.

Abstract

Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs and exhibit shortcomings in conditional generation. To address these challenges, we propose Diffusion-EAGS, a novel framework that integrates conditional masked language models into diffusion language models through the theoretical lens of a conditional Markov Random Field. In doing so, we propose entropy-adaptive Gibbs sampling and entropy-based noise scheduling to counterbalance each model's shortcomings. Experimental results show that Diffusion-EAGS outperforms baselines and achieves the best quality-diversity tradeoff, demonstrating its effectiveness in non-autoregressive text generation.

Conditional [MASK] Discrete Diffusion Language Model

TL;DR

This work presents Diffusion-EAGS, a framework that fuses conditional masked language models with discrete diffusion models via a conditional Markov Random Field to achieve high-quality, diverse, and controllable text generation. It introduces two mechanisms—Entropy-Adaptive Gibbs Sampling (EAGS) for stepwise, uncertainty-driven updates, and Entropy-based Noise Scheduling (ENS) for structured denoising during training—alongside an energy-based interpretation to guarantee progressive energy reduction. Through experiments on RocStories and Paradetox against ARMs, CMLMs, and DDLMs, the approach yields superior quality-diversity tradeoffs and demonstrates robust keyword-based control. The findings suggest that integrating MLMs into diffusion frameworks, guided by entropy-aware strategies, can mitigate degeneration in conditional generation and offer practical benefits for controllable NLP applications. Limitations include exploration with other PLMs and extending to tasks beyond generation; future work could adapt the framework to encoder-decoder PLMs and broader NLP tasks.

Abstract

Although auto-regressive models excel in natural language processing, they often struggle to generate diverse text and provide limited controllability. Non-auto-regressive methods could be an alternative but often produce degenerate outputs and exhibit shortcomings in conditional generation. To address these challenges, we propose Diffusion-EAGS, a novel framework that integrates conditional masked language models into diffusion language models through the theoretical lens of a conditional Markov Random Field. In doing so, we propose entropy-adaptive Gibbs sampling and entropy-based noise scheduling to counterbalance each model's shortcomings. Experimental results show that Diffusion-EAGS outperforms baselines and achieves the best quality-diversity tradeoff, demonstrating its effectiveness in non-autoregressive text generation.

Paper Structure

This paper contains 74 sections, 12 equations, 6 figures, 22 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overview of how our approach (Diffusion-EAGS) combines the strengths of MLM and diffusion-based models to overcome the limitations of AR models, achieving a better diversity-quality tradeoff and fine-grained controllability
  • Figure 2: Overview of the training (forward) and inference (backward) processes in Diffusion-EAGS. Training (left): Entropy-based Noise Scheduling (ENS) determines which tokens in the masked sequence, denoted by $[M]$, should be denoised at each timestep based on the position entropy $H(x_i)$. These tokens are then generated using the diffusion model with parameters $\theta$, and the loss is computed using a cross-entropy (C.E.) diffusion loss. Inference (right): Starting from a fully masked sequence conditioned on $Y$, Entropy-Adaptive Gibbs Sampling (EAGS) iteratively refines the sequence by focusing on high-entropy tokens, denoted as $M_t$, based on a threshold $\tau_t$, yielding stable and coherent text generation.
  • Figure 3: Quality--diversity tradeoff across various models. The x-axis ($1/\text{PPL}$) reflects generation quality, while the y-axis (VSemb) indicates diversity. Green points represent AR models, yellow points represent diffusion models, and blue points represent CMLMs. Our Diffusion-EAGS variants, marked by purple stars, achieve the best overall tradeoff.
  • Figure 4: When a condition is provided, the distribution of potential values for the samples is shifted on a logarithmic scale.
  • Figure 5: Entropy behavior tracking in generation/training process.
  • ...and 1 more figures