Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

Kun Zhou; Yifan Li; Wayne Xin Zhao; Ji-Rong Wen

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

Kun Zhou, Yifan Li, Wayne Xin Zhao, Ji-Rong Wen

TL;DR

Diffusion-NAT presents a novel integration of discrete diffusion with a pretrained seq2seq model (BART) for non-autoregressive text generation. By modeling diffusion as masked-token recovering and removing time-step embeddings, it unifies the denoising process with PLM inference, enabling iterative self-prompting to refine outputs. Across seven datasets spanning dialog, summarization, and QA, the approach outperforms competitive NAR methods and, on several tasks, matches or exceeds autoregressive baselines. The work highlights the potential of combining discrete diffusion with strong PLMs to achieve higher quality NAR generation, while acknowledging latency as an area for further improvement.

Abstract

Recently, continuous diffusion models (CDM) have been introduced into non-autoregressive (NAR) text-to-text generation. However, the discrete nature of text increases the difficulty of CDM to generate coherent and fluent texts, and also causes the incompatibility problem between CDM and advanced NLP techniques, especially the popular pre-trained language models~(PLMs). To solve it, we propose Diffusion-NAT, which introduces discrete diffusion models~(DDM) into NAR text-to-text generation and integrates BART to improve the performance. By revising the decoding process of BART and the typical settings of DDM, we unify the inference process of BART and the denoising process of DDM into the same NAR masked tokens recovering task. In this way, DDM can rely on BART to perform denoising, which can benefit from both the rich pre-learned knowledge of BART and the iterative refining paradigm of DDM. Besides, we also propose the iterative self-prompting strategy to further improve the generation quality. Experimental results on 7 datasets show that our approach can outperform competitive NAR methods, and even surpass autoregressive methods. Our code and data will be publicly released.

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

TL;DR

Abstract

Paper Structure (39 sections, 12 equations, 2 figures, 11 tables)

This paper contains 39 sections, 12 equations, 2 figures, 11 tables.

Introduction
Related Work
Non-Autoregressive Text Generation.
PLMs for Text Generation.
Diffusion Models for Text Generation.
Preliminary
Problem Statement.
Diffusion Models.
Discrete Diffusion Models.
Approach
Overview
Adapting BART for NAR Generation
BART.
Revised NAR Decoding Process.
Adapting DDM for NAR Generation
...and 24 more sections

Figures (2)

Figure 1: The overview of our Diffusion-NAT. We show an example that generates a response in the $t$-th step using $K$-turn self-prompting. The given dialog context and the $K$-turn prompt (i.e., estimated $\hat{Y}_0$) are fed into BART encoder, and the response in the $t$-th $Y_t$ is fed into BART decoder for estimating the original tokens.
Figure 2: Performance changes of our approach w.r.t. the training steps on PersonaChat dataset.

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

TL;DR

Abstract

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)