Understanding the Quality-Diversity Trade-off in Diffusion Language Models
Zak Buzzard
TL;DR
This work tackles the challenge of controlling the quality-diversity trade-off in diffusion language models that operate in embedding space for text. It introduces two inference-time techniques—classifier-free guidance and stochastic clamping—and a combined approach to tune fidelity and diversity without retraining. Using a transformer-based encoder-decoder and an anchor loss with importance sampling, the authors show competitive QQP paraphrasing results with only about three hours of training on a single GPU and provide an open-source implementation. The findings indicate that diffusion-based text generation can achieve a broad range of generation qualities and diversities efficiently, with practical length-controllable generation and favorable comparisons to state-of-the-art, while also highlighting evaluation and length-control limitations for future work.
Abstract
Diffusion models have seen immense success in modelling continuous data across a range of domains such as vision and audio. Despite the challenges of adapting diffusion models to discrete data, recent work explores their application to text generation by working in the continuous embedding space. However, these models lack a natural means to control the inherent trade-off between quality and diversity as afforded by the temperature hyperparameter in autoregressive models, hindering understanding of model performance and restricting generation quality. This work proposes the use of classifier-free guidance and stochastic clamping for manipulating the quality-diversity trade-off on sequence-to-sequence tasks, demonstrating that these techniques may be used to improve the performance of a diffusion language model.
