Table of Contents
Fetching ...

Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings

Aashish Arora, Elsbeth Turcan

TL;DR

This study evaluates data augmentation strategies for multi-label emotion classification in low-resource settings, comparing Back Translation with auto-encoder-based methods using a pseudo-labeling pipeline. By simulating low-resource conditions with a Reddit-derived unlabeled corpus and a mapped IESO dataset, the authors show Back Translation yields the most diverse synthetic text and, when combined with multi-fold generation, improves classification performance. However, semantic fidelity can degrade with higher augmentation multiplicity, and domain differences between Twitter-based models and Reddit data limit generalization. The work highlights practical guidance for augmenting emotion classification in resource-constrained scenarios and suggests future exploration of autoregressive and long-sequence models to further enhance robustness.

Abstract

Data augmentation has the potential to improve the performance of machine learning models by increasing the amount of training data available. In this study, we evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset. Our results showed that Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement. In addition, we found that Back Translation generated the most diverse set of unigrams and trigrams. These findings demonstrate the utility of Back Translation in enhancing the performance of emotion classification models in resource-limited situations.

Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings

TL;DR

This study evaluates data augmentation strategies for multi-label emotion classification in low-resource settings, comparing Back Translation with auto-encoder-based methods using a pseudo-labeling pipeline. By simulating low-resource conditions with a Reddit-derived unlabeled corpus and a mapped IESO dataset, the authors show Back Translation yields the most diverse synthetic text and, when combined with multi-fold generation, improves classification performance. However, semantic fidelity can degrade with higher augmentation multiplicity, and domain differences between Twitter-based models and Reddit data limit generalization. The work highlights practical guidance for augmenting emotion classification in resource-constrained scenarios and suggests future exploration of autoregressive and long-sequence models to further enhance robustness.

Abstract

Data augmentation has the potential to improve the performance of machine learning models by increasing the amount of training data available. In this study, we evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset. Our results showed that Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement. In addition, we found that Back Translation generated the most diverse set of unigrams and trigrams. These findings demonstrate the utility of Back Translation in enhancing the performance of emotion classification models in resource-limited situations.
Paper Structure (25 sections, 1 equation, 4 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 1 equation, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of generated examples using Back Translation, BERT, and RoBERTa based data augmentation techniques. Back Translation produced the most diverse set of unigrams and trigrams, while the autoencoder-based methods (BERT and RoBERTa) modified existing words or spans. The original sentence is displayed for comparison.
  • Figure 2: Examples of posts in the IESO dataset.
  • Figure 3: Example of post in the Reddit dataset
  • Figure 4: Figure illustrating the impact of multi-fold generation using BERT-based data augmentation methods on semantic fidelity. The masked tokens in the original sentence are indicated with underlining. As the number of synthesized sentences per training example increases, we see a decline in the ability of the generated text to accurately convey the meaning of the original sentence.