Text Diffusion with Reinforced Conditioning

Yuxuan Liu; Tianchi Yang; Shaohan Huang; Zihan Zhang; Haizhen Huang; Furu Wei; Weiwei Deng; Feng Sun; Qi Zhang

Text Diffusion with Reinforced Conditioning

Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

TL;DR

This work tackles the core challenge of applying diffusion models to text by identifying two bottlenecks: degradation of self-conditioning during training and misalignment between training and sampling. It introduces TReC, a text diffusion model that combines Reinforced Conditioning with a reinforcement-learning objective and Time-Aware Variance Scaling to mitigate these issues. Empirical results across machine translation, paraphrase, and question generation show that TReC is competitive with autoregressive and non-autoregressive baselines and superior to several diffusion baselines, while qualitative analysis highlights improved utilization of the diffusion process for iterative refinement. The approach provides practical guidelines for stabilizing training and aligning inference in diffusion-based NLG, with broad implications for high-quality conditional text generation.

Abstract

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive sequence generation. However, existing text diffusion models still fall short in their performance due to a challenge in handling the discreteness of language. This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling. Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling. Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.

Text Diffusion with Reinforced Conditioning

TL;DR

Abstract

Paper Structure (29 sections, 14 equations, 3 figures, 3 tables)

This paper contains 29 sections, 14 equations, 3 figures, 3 tables.

Introduction
Preliminaries
Denoising Diffusion Probablistic Models
Self-Conditioning
Continuous Diffusion for Text Generation
Pitfalls of Status Quo
Degradation of Self-Conditioning During Training
Misalignment With Training During Sampling
Study on Misalignment During Sampling
Connection Between the Two Limitations
Methods
Reinforced Conditioning
Environment and Agents
Reward and Training Objective
Time-Aware Variance Scaling
...and 14 more sections

Figures (3)

Figure 1: Degradation of Self-Conditioning. (a) Quality advantage ($\Delta$BLEU) from self-conditioning on valid set during training, which first increases and then decreases. (b) BLEU scores based on inputs $(\forall,z_0,x,t)$ and $(\forall,z_T,x,t)$ constructed from validation samples. This further validates that the model is extremely sensitive to $\hat{z}_0$ (the first term, previous prediction) and insensitive to the $z_t$ (the second term, noised latent) to be denoised.
Figure 2: Illustration of TReC, including Reinforced Conditioning and Time-Aware Variance Scaling.
Figure 3: Degradation Tendency and Training Dynamics for TReC w/ and w/o RL on the Quasar task with 3 different seeds.

Theorems & Definitions (2)

Definition 1: Degradation of Self-Condition
Definition 2: Misalignment During Sampling

Text Diffusion with Reinforced Conditioning

TL;DR

Abstract

Text Diffusion with Reinforced Conditioning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (2)