Table of Contents
Fetching ...

Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion

Fan Zhou, Chang Tian, Tim Van de Cruys

TL;DR

Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion tackles the challenge of steering diffusion-based text generation toward target styles while preserving content. It introduces RegDiff, a latent-space diffusion framework that couples a frozen VAE encoder–decoder with a latent diffusion model, and imposes attribute supervision during training to regularize the latent space without requiring a pretrained classifier at sampling time. Across five style-transfer datasets, RegDiff achieves competitive style-transfer accuracy and solid semantic preservation, with some trade-offs in surface fluency due to non-autoregressive decoding. This approach demonstrates that training-time regularization in the latent space can yield efficient, scalable, and generalizable attribute control for diffusion-based NLP generation, reducing dependence on inference-time classifiers.

Abstract

Generating stylistic text with specific attributes is a key problem in controllable text generation. Recently, diffusion models have emerged as a powerful paradigm for both visual and textual generation. Existing approaches can be broadly categorized into classifier-free guidance (CFG) and classifier guidance (CG) methods. While CFG effectively preserves semantic content, it often fails to provide effective attribute control. In contrast, CG modifies the denoising trajectory using classifier gradients, enabling better attribute alignment but incurring high computational costs during sampling and suffering from classifier generalization issues. In this work, we propose RegDiff, a regularized diffusion framework that leverages attribute features without requiring a pretrained classifier during sampling, thereby achieving controllable generation with reduced computational costs. Specifically, RegDiff employs a VAE-based encoder--decoder architecture to ensure reconstruction fidelity and a latent diffusion model trained with attribute supervision to enable controllable text generation. Attribute information is injected only during training. Experiments on five datasets spanning multiple stylistic attributes demonstrate that RegDiff outperforms strong baselines in generating stylistic texts. These results validate the effectiveness of RegDiff as an efficient solution for attribute-controllable text diffusion. Our code, datasets, and resources will be released upon publication at https://github.com/xxxx.

Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion

TL;DR

Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion tackles the challenge of steering diffusion-based text generation toward target styles while preserving content. It introduces RegDiff, a latent-space diffusion framework that couples a frozen VAE encoder–decoder with a latent diffusion model, and imposes attribute supervision during training to regularize the latent space without requiring a pretrained classifier at sampling time. Across five style-transfer datasets, RegDiff achieves competitive style-transfer accuracy and solid semantic preservation, with some trade-offs in surface fluency due to non-autoregressive decoding. This approach demonstrates that training-time regularization in the latent space can yield efficient, scalable, and generalizable attribute control for diffusion-based NLP generation, reducing dependence on inference-time classifiers.

Abstract

Generating stylistic text with specific attributes is a key problem in controllable text generation. Recently, diffusion models have emerged as a powerful paradigm for both visual and textual generation. Existing approaches can be broadly categorized into classifier-free guidance (CFG) and classifier guidance (CG) methods. While CFG effectively preserves semantic content, it often fails to provide effective attribute control. In contrast, CG modifies the denoising trajectory using classifier gradients, enabling better attribute alignment but incurring high computational costs during sampling and suffering from classifier generalization issues. In this work, we propose RegDiff, a regularized diffusion framework that leverages attribute features without requiring a pretrained classifier during sampling, thereby achieving controllable generation with reduced computational costs. Specifically, RegDiff employs a VAE-based encoder--decoder architecture to ensure reconstruction fidelity and a latent diffusion model trained with attribute supervision to enable controllable text generation. Attribute information is injected only during training. Experiments on five datasets spanning multiple stylistic attributes demonstrate that RegDiff outperforms strong baselines in generating stylistic texts. These results validate the effectiveness of RegDiff as an efficient solution for attribute-controllable text diffusion. Our code, datasets, and resources will be released upon publication at https://github.com/xxxx.

Paper Structure

This paper contains 46 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Sentiment data without inductive bias.
  • Figure 2: Formality data without inductive bias.
  • Figure 3: A graphical representation of the RegDiff framework.
  • Figure 4: The two figures represent: Inductive biased formality clusters and inductive biased authorship clusters.
  • Figure 5: The two figures represent: Inductive biased formality clusters with decoded texts' style clusters and inductive biased authorship clusters with decoded texts' style clusters. Class 0-3 represents: style A, style B, predicted style A and predicted style B
  • ...and 2 more figures