Text Diffusion with Reinforced Conditioning
Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
TL;DR
This work tackles the core challenge of applying diffusion models to text by identifying two bottlenecks: degradation of self-conditioning during training and misalignment between training and sampling. It introduces TReC, a text diffusion model that combines Reinforced Conditioning with a reinforcement-learning objective and Time-Aware Variance Scaling to mitigate these issues. Empirical results across machine translation, paraphrase, and question generation show that TReC is competitive with autoregressive and non-autoregressive baselines and superior to several diffusion baselines, while qualitative analysis highlights improved utilization of the diffusion process for iterative refinement. The approach provides practical guidelines for stabilizing training and aligning inference in diffusion-based NLG, with broad implications for high-quality conditional text generation.
Abstract
Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive sequence generation. However, existing text diffusion models still fall short in their performance due to a challenge in handling the discreteness of language. This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling. Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling. Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.
