Table of Contents
Fetching ...

Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Jongga Lee, Jaeseung Yim, Seohee Park, Changwon Lim

TL;DR

This paper investigates text classification under data shortage by comparing regularization strategies across simple and complex models on four datasets. It evaluates adversarial training, Pi model, and virtual adversarial training applied to SWEM, CNN, and BiLSTM variants, using losses such as $J(\boldsymbol\theta,\mathbf{x},y)$, $H(\boldsymbol\theta,\mathbf{x})$, $MSE$, and $KLD$. The main finding is that while SWEM is strong in fully supervised settings, complex models benefit substantially from regularization when unlabeled data is available, with BiLSTM(MAX) combined with VAT/AT achieving top performance on several datasets (e.g., DBpedia $97.62\%$). This demonstrates that distribution-smoothing regularization can stabilize training and improve generalization in data-scarce text classification, with implications for applying these priors to Transformer-based architectures in future work.

Abstract

Text classification is the task of assigning a document to a predefined class. However, it is expensive to acquire enough labeled documents or to label them. In this paper, we study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models (CNN and BiLSTM). In supervised learning, adversarial training can further regularize the model. When an unlabeled dataset is available, we can regularize the model using semi-supervised learning methods such as the Pi model and virtual adversarial training. We evaluate the regularization effects on four text classification datasets (AG news, DBpedia, Yahoo! Answers, Yelp Polarity), using only 0.1% to 0.5% of the original labeled training documents. The simple model performs relatively well in fully supervised learning, but with the help of adversarial training and semi-supervised learning, both simple and complex models can be regularized, showing better results for complex models. Although the simple model is robust to overfitting, a complex model with well-designed prior beliefs can be also robust to overfitting.

Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

TL;DR

This paper investigates text classification under data shortage by comparing regularization strategies across simple and complex models on four datasets. It evaluates adversarial training, Pi model, and virtual adversarial training applied to SWEM, CNN, and BiLSTM variants, using losses such as , , , and . The main finding is that while SWEM is strong in fully supervised settings, complex models benefit substantially from regularization when unlabeled data is available, with BiLSTM(MAX) combined with VAT/AT achieving top performance on several datasets (e.g., DBpedia ). This demonstrates that distribution-smoothing regularization can stabilize training and improve generalization in data-scarce text classification, with implications for applying these priors to Transformer-based architectures in future work.

Abstract

Text classification is the task of assigning a document to a predefined class. However, it is expensive to acquire enough labeled documents or to label them. In this paper, we study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models (CNN and BiLSTM). In supervised learning, adversarial training can further regularize the model. When an unlabeled dataset is available, we can regularize the model using semi-supervised learning methods such as the Pi model and virtual adversarial training. We evaluate the regularization effects on four text classification datasets (AG news, DBpedia, Yahoo! Answers, Yelp Polarity), using only 0.1% to 0.5% of the original labeled training documents. The simple model performs relatively well in fully supervised learning, but with the help of adversarial training and semi-supervised learning, both simple and complex models can be regularized, showing better results for complex models. Although the simple model is robust to overfitting, a complex model with well-designed prior beliefs can be also robust to overfitting.
Paper Structure (12 sections, 1 equation, 2 figures, 4 tables, 4 algorithms)

This paper contains 12 sections, 1 equation, 2 figures, 4 tables, 4 algorithms.

Figures (2)

  • Figure 1: Training procees with and without regularization
  • Figure 2: Frequency of timesteps contributing to final feature