Table of Contents
Fetching ...

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, Dmitry Vetrov

TL;DR

This work addresses the limitations of autoregressive text generation by proposing Text Encoding Diffusion Model (TEncDM), which performs diffusion in the latent space of pre-trained language-model encodings and employs a context-aware Transformer decoder. Key contributions include demonstrating that encodings outperform embeddings for diffusion, introducing a Transformer-based decoder trained with corruption-aware objectives, and analyzing self-conditioning and a tan-d noise scheduler to optimize denoising dynamics. Through extensive ablations on ROCStories and Wikipedia, the authors show that TEncDM outperforms embedding-based diffusion models and is competitive with autoregressive baselines on several tasks, while enabling faster, non-autoregressive generation. The approach offers practical benefits for conditional text generation across paraphrasing, summarization, and simplification, highlighting the importance of encoding-space diffusion, robust decoding, and carefully designed noise schedules.

Abstract

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

TL;DR

This work addresses the limitations of autoregressive text generation by proposing Text Encoding Diffusion Model (TEncDM), which performs diffusion in the latent space of pre-trained language-model encodings and employs a context-aware Transformer decoder. Key contributions include demonstrating that encodings outperform embeddings for diffusion, introducing a Transformer-based decoder trained with corruption-aware objectives, and analyzing self-conditioning and a tan-d noise scheduler to optimize denoising dynamics. Through extensive ablations on ROCStories and Wikipedia, the authors show that TEncDM outperforms embedding-based diffusion models and is competitive with autoregressive baselines on several tasks, while enabling faster, non-autoregressive generation. The approach offers practical benefits for conditional text generation across paraphrasing, summarization, and simplification, highlighting the importance of encoding-space diffusion, robust decoding, and carefully designed noise schedules.

Abstract

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.
Paper Structure (56 sections, 3 equations, 13 figures, 11 tables)

This paper contains 56 sections, 3 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Overview of our framework design for conditional generation. Top is the training process, bottom is the generation process.
  • Figure 2: Generation quality of diffusion models with and without self-conditioning on ROCStories dataset.
  • Figure 3: Prediction magnitudes for generation processes with different amount of steps on ROCStories dataset.
  • Figure 4: Reconstruction loss and reconstruction accuracy of diffusion models trained with different noise schedulers on ROCStories dataset.
  • Figure 5: The dependence between the generation quality and the maximum amount of noise in $z_t$ during the decoder training.
  • ...and 8 more figures