Table of Contents
Fetching ...

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

TL;DR

DiffuSeq-v2 introduces a learnable soft absorbing state to bridge continuous and discrete diffusion in Seq2Seq text generation, enabling faster training convergence and dramatically accelerated sampling via DPM-solver++ ODE integration. The method jointly denoises a mixture of Gaussian noise and discrete absorbing noise, aligning training with inference and removing the need for MBR decoding during sampling. Empirical results on QQP show comparable or better quality with substantial speedups: training converges roughly 4x faster, and sampling can be about 800x faster than the original DiffuSeq while preserving quality. This approach enhances the practicality of diffusion-based text generation by reducing computational overhead and enabling efficient, high-quality conditional text generation in practice.

Abstract

Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space, thereby enhancing its capacity to recover conditional signals. During the sampling phase, we employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Comprehensive experimental evaluations reveal that our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster, rendering it significantly closer to practical application. \footnote{The code is released at \url{https://github.com/Shark-NLP/DiffuSeq}

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

TL;DR

DiffuSeq-v2 introduces a learnable soft absorbing state to bridge continuous and discrete diffusion in Seq2Seq text generation, enabling faster training convergence and dramatically accelerated sampling via DPM-solver++ ODE integration. The method jointly denoises a mixture of Gaussian noise and discrete absorbing noise, aligning training with inference and removing the need for MBR decoding during sampling. Empirical results on QQP show comparable or better quality with substantial speedups: training converges roughly 4x faster, and sampling can be about 800x faster than the original DiffuSeq while preserving quality. This approach enhances the practicality of diffusion-based text generation by reducing computational overhead and enabling efficient, high-quality conditional text generation in practice.

Abstract

Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space, thereby enhancing its capacity to recover conditional signals. During the sampling phase, we employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Comprehensive experimental evaluations reveal that our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster, rendering it significantly closer to practical application. \footnote{The code is released at \url{https://github.com/Shark-NLP/DiffuSeq}
Paper Structure (26 sections, 16 equations, 4 figures, 2 tables)

This paper contains 26 sections, 16 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Training and sampling stages with discrete noise, which helps the two stages align better.
  • Figure 2: The test BLEU score along with training hours under different training schemes.
  • Figure 3: Generation speed and quality under different sampling steps incorporating DPM-solver.
  • Figure 4: The test BLEU score at different training hours for different settings of the ratio $\gamma$.