A Semi-Supervised Text Generation Framework Combining a Deep Transformer and a GAN
Shengquan Wang
TL;DR
This work addresses text generation by marrying a deep Transformer language model with a GAN framework to enable semi-supervised learning through data augmentation. It uses a 24-layer GPT-2-like autoregressive model, a GAN with a minimax objective approximating the Jensen-Shannon divergence, and the Gumbel-Softmax reparameterization to handle discrete token outputs. Theoretical contributions include convergence analysis for the GAN objective, formal treatment of Gumbel-Softmax, and a semi-supervised learning protocol that combines synthetic text with real data to fine-tune the Transformer on an augmented dataset. The approach demonstrates that deeper models plus GAN-generated synthetic data can improve language modeling metrics, offering a pragmatic path to leverage unlabeled text when labeled data is scarce, with implications for more data-efficient text generation and augmentation pipelines. The significance lies in providing a principled framework to train discrete-sequence generators using gradient-based optimization, while highlighting challenges such as mode collapse and suggesting future extensions to sequence-focused GAN variants and reinforcement learning techniques.
Abstract
This paper introduces a framework that connects a deep generative pre-trained Transformer language model with a generative adversarial network for semi-supervised text generation. In other words, the proposed model is first pre-trained unsupervised on a large and diverse text corpus with 24 layers. Then a simple GAN architecture for synthetic text generation is introduced, and Gumbel-Softmax is applied to handle the discreteness of tokens. The paper also shows a semi-supervised approach where real data is augmented with GAN samples, which is further used to fine-tune the Transformer model on the merged dataset. Detailed theoretical derivations are also included, outlining the proof of the min-max objective function, and an extensive discussion of the Gumbel-Softmax reparameterization trick.
