Diffusion Models for Non-autoregressive Text Generation: A Survey
Yifan Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
TL;DR
This survey analyzes diffusion-model-based approaches to non-autoregressive text generation, detailing the forward and reverse processes, and distinguishing discrete-token versus continuous-embedding formulations. It surveys core design choices—denoising networks, noise schedules, objective functions, and conditioning strategies—and how PLMs are integrated either as denoisers or via latent-space diffusion, including task-specific pre-training adaptations. The paper highlights key benefits for NAR tasks, such as constrained iterative refinement and intermediate controllability, while summarizing optimization tricks (clamping, self-conditioning, semi-NAR decoding) and training techniques (importance timestep sampling). It concludes with future directions, including tailored noise schedules, better PLM integration, unified multimodal diffusion models, and alignment with human values. Overall, the work provides a structured reference for researchers to design and evaluate diffusion-based NAR text generation systems and to push toward more efficient, controllable, and multimodal capabilities.
Abstract
Non-autoregressive (NAR) text generation has attracted much attention in the field of natural language processing, which greatly reduces the inference latency but has to sacrifice the generation accuracy. Recently, diffusion models, a class of latent variable generative models, have been introduced into NAR text generation, showing an improved text generation quality. In this survey, we review the recent progress in diffusion models for NAR text generation. As the background, we first present the general definition of diffusion models and the text diffusion models, and then discuss their merits for NAR generation. As the core content, we further introduce two mainstream diffusion models in existing work of text diffusion, and review the key designs of the diffusion process. Moreover, we discuss the utilization of pre-trained language models (PLMs) for text diffusion models and introduce optimization techniques for text data. Finally, we discuss several promising directions and conclude this paper. Our survey aims to provide researchers with a systematic reference of related research on text diffusion models for NAR generation. We present our collection of text diffusion models at https://github.com/RUCAIBox/Awesome-Text-Diffusion-Models.
