Quantized Embedding Vectors for Controllable Diffusion Language Models
Cheng Kang, Xinye Chen, Yong Hu, Daniel Novak
TL;DR
This work addresses the high resource demands and limited portability of diffusion language models by proposing QE-CDLM, which remodels the task-specific embedding space through vector quantization and combines diffusion generation with an adaption fine-tuning strategy. The method quantizes embedding vectors to three- or four-state schemes and uses a gradient-based controller to steer generation, aided by fluency regularization and Minimum Bayes Risk decoding. Key contributions include demonstrating faster convergence, reduced tunable parameters (over 90% with LoRA), and improved perplexity and control on five fine-grained tasks, across three datasets, while maintaining competitive fluency. The approach advances practical controllable diffusion LMs by delivering portability, speed, and stability benefits suitable for deployment and broader applicability in controlled text generation scenarios.
Abstract
Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.
