Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

Yingji Zhang; Danilo S. Carvalho; Marco Valentino; Ian Pratt-Hartmann; Andre Freitas

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

Yingji Zhang, Danilo S. Carvalho, Marco Valentino, Ian Pratt-Hartmann, Andre Freitas

TL;DR

This work tackles the challenge of precise semantic control in NLP latent spaces by introducing T5VQVAE, a Transformer-based VQVAE that uses a discrete latent codebook $z \\in \\mathbb{R}^{K \times I}$ to guide token-level cross-attention. The model optimizes a composite objective that combines reconstruction, latent-space alignment, and an encoder-constraint term, with $z$ quantized via a codebook updated through EMA and typically via k-means, enabling stable training without KL-vanishing concerns. Empirically, T5VQVAE outperforms state-of-the-art VAE baselines like Optimus across autoencoding, text transfer, and inference tasks, and exhibits controllable latent-space manipulations such as traversal, interpolation, and vector arithmetic, including quasi-symbolic reasoning in inference. These results suggest a practical path toward fine-grained semantic control and potential applications in NLP and symbolic reasoning, while highlighting future work in word-level disentanglement and interpretability of discrete latent representations.

Abstract

Achieving precise semantic control over the latent spaces of Variational AutoEncoders (VAEs) holds significant value for downstream tasks in NLP as the underlying generative mechanisms could be better localised, explained and improved upon. Recent research, however, has struggled to achieve consistent results, primarily due to the inevitable loss of semantic information in the variational bottleneck and limited control over the decoding mechanism. To overcome these challenges, we investigate discrete latent spaces in Vector Quantized Variational AutoEncoders (VQVAEs) to improve semantic control and generation in Transformer-based VAEs. In particular, We propose T5VQVAE, a novel model that leverages the controllability of VQVAEs to guide the self-attention mechanism in T5 at the token-level, exploiting its full generalization capabilities. Experimental results indicate that T5VQVAE outperforms existing state-of-the-art VAE models, including Optimus, in terms of controllability and preservation of semantic information across different tasks such as auto-encoding of sentences and mathematical expressions, text transfer, and inference. Moreover, T5VQVAE exhibits improved inference capabilities, suggesting potential applications for downstream natural language and symbolic reasoning tasks.

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

TL;DR

This work tackles the challenge of precise semantic control in NLP latent spaces by introducing T5VQVAE, a Transformer-based VQVAE that uses a discrete latent codebook

to guide token-level cross-attention. The model optimizes a composite objective that combines reconstruction, latent-space alignment, and an encoder-constraint term, with

quantized via a codebook updated through EMA and typically via k-means, enabling stable training without KL-vanishing concerns. Empirically, T5VQVAE outperforms state-of-the-art VAE baselines like Optimus across autoencoding, text transfer, and inference tasks, and exhibits controllable latent-space manipulations such as traversal, interpolation, and vector arithmetic, including quasi-symbolic reasoning in inference. These results suggest a practical path toward fine-grained semantic control and potential applications in NLP and symbolic reasoning, while highlighting future work in word-level disentanglement and interpretability of discrete latent representations.

Abstract

Paper Structure (36 sections, 10 equations, 3 figures, 18 tables)

This paper contains 36 sections, 10 equations, 3 figures, 18 tables.

Introduction
Methodology
Model architecture.
Training T5VQVAE
Training the latent space.
Advantages of T5VQVAE.
Controllability Evaluation
Semantic Disentanglement
Interpolation Smoothness
Experiments
AutoEncoding Task
Pre-training Data.
Baselines.
Quantitative Evaluation.
Text Transfer Task
...and 21 more sections

Figures (3)

Figure 1: By controlling the token-level discrete latent space in VAEs, we aim to explicitly guide the cross-attention mechanism in T5 to improve the generation process. We focus on three challenging tasks to assess precise semantic control and inference.
Figure 2: Loss curves of T5VQVAEs (base) with and without EMA and Optimus on the WorldTree corpus.
Figure 3: t-SNE plot of the T5VQVAE latent space. Left: same role-content(PRED-is, ARG2-animal). Middle: different role-content(ARG0-PRED-ARG1, ARG1-PRED-ARG2). Right: different roles with same content (ARG0, 1, 2 - animal, ARG0, 1, 2 - water).

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

TL;DR

Abstract

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (3)