Table of Contents
Fetching ...

Generative and Discriminative Text Classification with Recurrent Neural Networks

Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

TL;DR

This work compares discriminative and generative LSTM architectures for text classification, focusing on sample efficiency and robustness to distribution shifts. The authors implement a discriminative model that maximizes $p(y\mid \boldsymbol{x})$ and generative models that maximize $p(\boldsymbol{x},y)=p(\boldsymbol{x}\mid y)p(y)$, including a Shared LSTM and an Independent LSTM variant, with a unified base encoder. Empirical results show the discriminative model attains lower asymptotic error, but generative models converge to their higher asymptotic error faster and outperform discriminative models in small-data, continual, and zero-shot settings, indicating better sample efficiency and adaptation. The paper also discusses computational trade-offs, data likelihood as a tool for detecting distribution shifts, and training strategies that enable rapid incorporation of new classes in continual learning. Overall, the findings extend Ng & Jordan's theoretical pattern from linear models to nonlinear LSTMs and highlight the practical advantages of generative approaches for shifting data distributions and low-resource scenarios.

Abstract

We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.

Generative and Discriminative Text Classification with Recurrent Neural Networks

TL;DR

This work compares discriminative and generative LSTM architectures for text classification, focusing on sample efficiency and robustness to distribution shifts. The authors implement a discriminative model that maximizes and generative models that maximize , including a Shared LSTM and an Independent LSTM variant, with a unified base encoder. Empirical results show the discriminative model attains lower asymptotic error, but generative models converge to their higher asymptotic error faster and outperform discriminative models in small-data, continual, and zero-shot settings, indicating better sample efficiency and adaptation. The paper also discusses computational trade-offs, data likelihood as a tool for detecting distribution shifts, and training strategies that enable rapid incorporation of new classes in continual learning. Overall, the findings extend Ng & Jordan's theoretical pattern from linear models to nonlinear LSTMs and highlight the practical advantages of generative approaches for shifting data distributions and low-resource scenarios.

Abstract

We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.

Paper Structure

This paper contains 25 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustrations of our discriminative (left) and generative (right) LSTM models.
  • Figure 2: Accuracies of generative and discriminative models with varying training size.
  • Figure 3: Classification accuracies on the AG News dataset for the generative LSTM models as we introduce class 0, 1, 2, and 3.
  • Figure 4: Log likelihood of test data $p(\boldsymbol{x})$ from the generative LSTM model on the AG News dataset when training data only includes three classes. In the top plot, we exclude training examples from class 0, whereas in the bottom plot we exclude training examples class 1. See text for details.