Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

Chunyuan Li; Xiang Gao; Yuan Li; Baolin Peng; Xiujun Li; Yizhe Zhang; Jianfeng Gao

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, Jianfeng Gao

TL;DR

Optimus introduces a large-scale pre-trained latent-variable language model that learns a universal sentence latent space via VAE objectives, enabling both guided generation and robust low-resource understanding. By grounding a BERT-like encoder and a GPT-2-like decoder in a shared latent space, Optimus achieves stronger representation learning, mitigates KL-vanishing through pre-training, and supports controllable generation via latent-space arithmetic and interpolation. The approach yields state-of-the-art results on VAE language modeling benchmarks, improves dialog and stylized text generation, and demonstrates notable benefits in few-shot or low-resource understanding settings. This work suggests that pre-training a meaningful latent space can make deep generative models more practical and versatile for NLP tasks in the modern pre-trained language modeling era.

Abstract

When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks. We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

TL;DR

Abstract

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)