Table of Contents
Fetching ...

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Bocheng Li, Zhujin Gao, Linli Xu

TL;DR

The paper addresses limitations in discrete versus continuous diffusion for text by proposing NeoDiff, a unified diffusion framework that uses a bi-temporal representation with extrinsic time $t$ and intrinsic time $\tau$. It introduces a Poisson forward diffusion for token-level granularity and a context-aware reverse process driven by a transformer-based time predictor, along with an optimized extrinsic time schedule via Bayesian optimization. The training objective combines $\mathcal{L}_z$, $\mathcal{L}_\tau$, and $\mathcal{L}_{\mathrm{anchor}}$ to stabilize embedding space and guide denoising. Empirical results across machine translation, paraphrasing, text simplification, and question generation show NeoDiff consistently outperforms baselines across non-autoregressive, iterative, and autoregressive diffusion methods, while maintaining competitive efficiency and enabling token-level control and diversity.

Abstract

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using categorical distributions, allowing for different diffusion progress across tokens but lacking fine-grained control. Continuous diffusion models map tokens to continuous spaces and apply fine-grained noise, but the diffusion progress is uniform across tokens, limiting their ability to capture semantic nuances. To address these limitations, we propose \textbf{\underline{N}}on-simultan\textbf{\underline{e}}ous C\textbf{\underline{o}}ntinuous \textbf{\underline{Diff}}usion Models (NeoDiff), a novel diffusion model that integrates the strengths of both discrete and continuous approaches. NeoDiff introduces a Poisson diffusion process for the forward process, enabling a flexible and fine-grained noising paradigm, and employs a time predictor for the reverse process to adaptively modulate the denoising progress based on token semantics. Furthermore, NeoDiff utilizes an optimized schedule for inference to ensure more precise noise control and improved performance. Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation. Experimental results on several text generation tasks demonstrate NeoDiff's superior performance compared to baselines of non-autoregressive continuous and discrete diffusion models, iterative-based methods and autoregressive diffusion-based methods. These results highlight NeoDiff's potential as a powerful tool for generating high-quality text and advancing the field of diffusion-based text generation.

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

TL;DR

The paper addresses limitations in discrete versus continuous diffusion for text by proposing NeoDiff, a unified diffusion framework that uses a bi-temporal representation with extrinsic time and intrinsic time . It introduces a Poisson forward diffusion for token-level granularity and a context-aware reverse process driven by a transformer-based time predictor, along with an optimized extrinsic time schedule via Bayesian optimization. The training objective combines , , and to stabilize embedding space and guide denoising. Empirical results across machine translation, paraphrasing, text simplification, and question generation show NeoDiff consistently outperforms baselines across non-autoregressive, iterative, and autoregressive diffusion methods, while maintaining competitive efficiency and enabling token-level control and diversity.

Abstract

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using categorical distributions, allowing for different diffusion progress across tokens but lacking fine-grained control. Continuous diffusion models map tokens to continuous spaces and apply fine-grained noise, but the diffusion progress is uniform across tokens, limiting their ability to capture semantic nuances. To address these limitations, we propose \textbf{\underline{N}}on-simultan\textbf{\underline{e}}ous C\textbf{\underline{o}}ntinuous \textbf{\underline{Diff}}usion Models (NeoDiff), a novel diffusion model that integrates the strengths of both discrete and continuous approaches. NeoDiff introduces a Poisson diffusion process for the forward process, enabling a flexible and fine-grained noising paradigm, and employs a time predictor for the reverse process to adaptively modulate the denoising progress based on token semantics. Furthermore, NeoDiff utilizes an optimized schedule for inference to ensure more precise noise control and improved performance. Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation. Experimental results on several text generation tasks demonstrate NeoDiff's superior performance compared to baselines of non-autoregressive continuous and discrete diffusion models, iterative-based methods and autoregressive diffusion-based methods. These results highlight NeoDiff's potential as a powerful tool for generating high-quality text and advancing the field of diffusion-based text generation.

Paper Structure

This paper contains 37 sections, 20 equations, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Comparison of the noising paradigms employed by Non-simultaneous Continuous Diffusion and two other diffusion models. The color intensity on the text tokens represents the token-level noising progress (intrinsic time $\tau$). Discrete diffusion applies an independent but coarse-grained noising paradigm to each token within a sentence. In contrast, continuous diffusion utilizes a fine-grained noising schedule but applies it uniformly across all tokens. NeoDiff distinguishes itself by assigning an independent, fine-grained intrinsic time $\tau$ to each token, with finer noising schedule in extrinsic time $t$.
  • Figure 2: An overview of NeoDiff.
  • Figure 3: Prompt templates used for LLM-based evaluation. Top: Translation evaluation prompt. Bottom: Paraphrase evaluation prompt.