Table of Contents
Fetching ...

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

Rie Johnson, Tong Zhang

TL;DR

The paper tackles text categorization by exploiting word order with CNNs applied directly to high-dimensional one-hot inputs, introducing seq-CNN and bow-CNN as core variants and a parallel-CNN extension to learn multiple region-embedding types, including regions of size $p$ and, for bow-CNN, bag-of-words representations of dimension $|V|$. Empirical results on sentiment (IMDB, Elec) and topic (RCV1) tasks show seq-CNN excels on sentiment while bow-CNN captures topic signals, and a parallel-CNN setup further improves accuracy, outperforming traditional bag-of-$n$-gram baselines and prior CNN approaches. The authors analyze why CNNs are effective, demonstrating that learned region embeddings can generalize to unseen $n$-grams and that higher-order context can be leveraged through bow-convolution. This approach offers a scalable, simpler alternative to previous methods, with strong performance and efficient GPU training, making word-order information readily usable for practical text classification tasks.

Abstract

Convolutional neural network (CNN) is a neural network that can make use of the internal structure of data such as the 2D structure of image data. This paper studies CNN on text categorization to exploit the 1D structure (namely, word order) of text data for accurate prediction. Instead of using low-dimensional word vectors as input as is often done, we directly apply CNN to high-dimensional text data, which leads to directly learning embedding of small text regions for use in classification. In addition to a straightforward adaptation of CNN from image to text, a simple but new variation which employs bag-of-word conversion in the convolution layer is proposed. An extension to combine multiple convolution layers is also explored for higher accuracy. The experiments demonstrate the effectiveness of our approach in comparison with state-of-the-art methods.

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

TL;DR

The paper tackles text categorization by exploiting word order with CNNs applied directly to high-dimensional one-hot inputs, introducing seq-CNN and bow-CNN as core variants and a parallel-CNN extension to learn multiple region-embedding types, including regions of size and, for bow-CNN, bag-of-words representations of dimension . Empirical results on sentiment (IMDB, Elec) and topic (RCV1) tasks show seq-CNN excels on sentiment while bow-CNN captures topic signals, and a parallel-CNN setup further improves accuracy, outperforming traditional bag-of--gram baselines and prior CNN approaches. The authors analyze why CNNs are effective, demonstrating that learned region embeddings can generalize to unseen -grams and that higher-order context can be leveraged through bow-convolution. This approach offers a scalable, simpler alternative to previous methods, with strong performance and efficient GPU training, making word-order information readily usable for practical text classification tasks.

Abstract

Convolutional neural network (CNN) is a neural network that can make use of the internal structure of data such as the 2D structure of image data. This paper studies CNN on text categorization to exploit the 1D structure (namely, word order) of text data for accurate prediction. Instead of using low-dimensional word vectors as input as is often done, we directly apply CNN to high-dimensional text data, which leads to directly learning embedding of small text regions for use in classification. In addition to a straightforward adaptation of CNN from image to text, a simple but new variation which employs bag-of-word conversion in the convolution layer is proposed. An extension to combine multiple convolution layers is also explored for higher accuracy. The experiments demonstrate the effectiveness of our approach in comparison with state-of-the-art methods.

Paper Structure

This paper contains 17 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Convolutional neural network.
  • Figure 2: Convolution layer for image. Each computation unit (oval) computes a non-linear function $\boldsymbol{\sigma}( {\mathbf W} \cdot {\mathbf r}_\ell({\mathbf x}) + {\mathbf b} )$ of a small region ${\mathbf r}_\ell({\mathbf x})$ of input image ${\mathbf x}$, where weight matrix ${\mathbf W}$ and bias vector ${\mathbf b}$ are shared by all the units in the same layer.
  • Figure 3: Convolution layer for variable-sized text.
  • Figure 4: CNN with two convolution layers in parallel.
  • Figure 5: Training time (minutes) on Tesla K20. The horizontal lines are the best-performing baselines.
  • ...and 1 more figures