Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes
Bocheng Li, Zhujin Gao, Linli Xu
TL;DR
The paper addresses limitations in discrete versus continuous diffusion for text by proposing NeoDiff, a unified diffusion framework that uses a bi-temporal representation with extrinsic time $t$ and intrinsic time $\tau$. It introduces a Poisson forward diffusion for token-level granularity and a context-aware reverse process driven by a transformer-based time predictor, along with an optimized extrinsic time schedule via Bayesian optimization. The training objective combines $\mathcal{L}_z$, $\mathcal{L}_\tau$, and $\mathcal{L}_{\mathrm{anchor}}$ to stabilize embedding space and guide denoising. Empirical results across machine translation, paraphrasing, text simplification, and question generation show NeoDiff consistently outperforms baselines across non-autoregressive, iterative, and autoregressive diffusion methods, while maintaining competitive efficiency and enabling token-level control and diversity.
Abstract
Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using categorical distributions, allowing for different diffusion progress across tokens but lacking fine-grained control. Continuous diffusion models map tokens to continuous spaces and apply fine-grained noise, but the diffusion progress is uniform across tokens, limiting their ability to capture semantic nuances. To address these limitations, we propose \textbf{\underline{N}}on-simultan\textbf{\underline{e}}ous C\textbf{\underline{o}}ntinuous \textbf{\underline{Diff}}usion Models (NeoDiff), a novel diffusion model that integrates the strengths of both discrete and continuous approaches. NeoDiff introduces a Poisson diffusion process for the forward process, enabling a flexible and fine-grained noising paradigm, and employs a time predictor for the reverse process to adaptively modulate the denoising progress based on token semantics. Furthermore, NeoDiff utilizes an optimized schedule for inference to ensure more precise noise control and improved performance. Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation. Experimental results on several text generation tasks demonstrate NeoDiff's superior performance compared to baselines of non-autoregressive continuous and discrete diffusion models, iterative-based methods and autoregressive diffusion-based methods. These results highlight NeoDiff's potential as a powerful tool for generating high-quality text and advancing the field of diffusion-based text generation.
