TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings
Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, Dmitry Vetrov
TL;DR
This work addresses the limitations of autoregressive text generation by proposing Text Encoding Diffusion Model (TEncDM), which performs diffusion in the latent space of pre-trained language-model encodings and employs a context-aware Transformer decoder. Key contributions include demonstrating that encodings outperform embeddings for diffusion, introducing a Transformer-based decoder trained with corruption-aware objectives, and analyzing self-conditioning and a tan-d noise scheduler to optimize denoising dynamics. Through extensive ablations on ROCStories and Wikipedia, the authors show that TEncDM outperforms embedding-based diffusion models and is competitive with autoregressive baselines on several tasks, while enabling faster, non-autoregressive generation. The approach offers practical benefits for conditional text generation across paraphrasing, summarization, and simplification, highlighting the importance of encoding-space diffusion, robust decoding, and carefully designed noise schedules.
Abstract
This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.
