Table of Contents
Fetching ...

On Variational Learning of Controllable Representations for Text without Supervision

Peng Xu, Jackie Chi Kit Cheung, Yanshuai Cao

TL;DR

This work identifies latent vacancy as a core obstacle in unsupervised controllable text generation with sequence VAEs, showing that manipulated latent codes often land in low-density regions of the aggregated posterior and fail to decouple style from content. To address this, the authors propose CP-VAE, which constrains the mean of a designated latent subspace to a learned K-dimensional probability simplex and encourages it to be well filled via regularizers and a structured reconstruction loss. By decomposing latent codes into controllable and residual parts and training within the constrained space, CP-VAE achieves first successful unsupervised controllable text representations and outperforms unsupervised baselines on text style transfer, while approaching supervised methods and enabling fine-grained style transitions such as topic changes mid-generation. The approach relies on topological and density-based analyses to motivate the constraint and demonstrates strong empirical results across Yelp, Amazon, and AG News datasets, highlighting its potential to reduce labeling needs and expand controllable generation capabilities.

Abstract

The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolating or extrapolating in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail to properly decode when the latent codes are manipulated, because the modified codes often land in holes or vacant regions in the aggregated posterior latent space, where the decoding network fails to generalize. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and performs manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method outperforms unsupervised baselines and strong supervised approaches on text style transfer, and is capable of performing more flexible fine-grained control over text generation than existing methods.

On Variational Learning of Controllable Representations for Text without Supervision

TL;DR

This work identifies latent vacancy as a core obstacle in unsupervised controllable text generation with sequence VAEs, showing that manipulated latent codes often land in low-density regions of the aggregated posterior and fail to decouple style from content. To address this, the authors propose CP-VAE, which constrains the mean of a designated latent subspace to a learned K-dimensional probability simplex and encourages it to be well filled via regularizers and a structured reconstruction loss. By decomposing latent codes into controllable and residual parts and training within the constrained space, CP-VAE achieves first successful unsupervised controllable text representations and outperforms unsupervised baselines on text style transfer, while approaching supervised methods and enabling fine-grained style transitions such as topic changes mid-generation. The approach relies on topological and density-based analyses to motivate the constraint and demonstrates strong empirical results across Yelp, Amazon, and AG News datasets, highlighting its potential to reduce labeling needs and expand controllable generation capabilities.

Abstract

The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolating or extrapolating in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail to properly decode when the latent codes are manipulated, because the modified codes often land in holes or vacant regions in the aggregated posterior latent space, where the decoding network fails to generalize. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and performs manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method outperforms unsupervised baselines and strong supervised approaches on text style transfer, and is capable of performing more flexible fine-grained control over text generation than existing methods.

Paper Structure

This paper contains 46 sections, 11 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Illustration of why latent vacancy prevents effective manipulation in VAEs. The aggregated posterior shown has multiple disconnected areas and direct manipulations of the relevant factor may fall into vacant regions of low density.
  • Figure 2: Histograms of all the test samples' negative log-likelihood (NLL) under the aggregated posterior, considering their original latent codes and manipulated ones. (A) (B) (C): three manipulation strategies for $\beta$-VAE with aggressive training; (D) CP-VAE.
  • Figure 3: Topological analysis of the highest density region (HDR) of aggregated posterior using the mapper algorithm. The connectedness of the graph holds the key topological information; the shape on the 2D plane is irrelevant. Different $n$'s control the coarseness of visualization. If a structure persists at multiple resolutions, it is stable. If it appears and disappears for selected value or a small range of $n$, then it is likely to be "topological noise".
  • Figure 4: Visualization of all training samples in the probability simplex: (A) With $\mathcal{L}_{\text{S-REC}}$ ;(B) Without $\mathcal{L}_{\text{S-REC}}$.