Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

Guanyi Mou; Yichuan Li; Kyumin Lee

Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

Guanyi Mou, Yichuan Li, Kyumin Lee

TL;DR

This work tackles the noise introduced by data augmentation in text classification by proposing MRCo, a framework that jointly learns per-sample augmentation weights and refines feature representations through contrastive learning. It combines a meta reweighting module with a bilevel optimization strategy and a weight-aware contrastive learning module aided by the LASW dequeue-enqueue mechanism to exploit high-quality augmented samples while leveraging low-weight ones. Empirical results on seven GLUE tasks show consistent improvements across Text-CNN and RoBERTa-base backbones, with notable gains on data-scarce tasks and strong ablations confirming the contributions of both components. The approach is augmentation-agnostic, plug-in compatible with existing encoders, and accompanied by an open-source implementation for reproducibility and broader reuse.

Abstract

Data augmentation has shown its effectiveness in resolving the data-hungry problem and improving model's generalization ability. However, the quality of augmented data can be varied, especially compared with the raw/original data. To boost deep learning models' performance given augmented data/samples in text classification tasks, we propose a novel framework, which leverages both meta learning and contrastive learning techniques as parts of our design for reweighting the augmented samples and refining their feature representations based on their quality. As part of the framework, we propose novel weight-dependent enqueue and dequeue algorithms to utilize augmented samples' weight/quality information effectively. Through experiments, we show that our framework can reasonably cooperate with existing deep learning models (e.g., RoBERTa-base and Text-CNN) and augmentation techniques (e.g., Wordnet and Easydata) for specific supervised learning tasks. Experiment results show that our framework achieves an average of 1.6%, up to 4.3% absolute improvement on Text-CNN encoders and an average of 1.4%, up to 4.4% absolute improvement on RoBERTa-base encoders on seven GLUE benchmark datasets compared with the best baseline. We present an indepth analysis of our framework design, revealing the non-trivial contributions of our network components. Our code is publicly available for better reproducibility.

Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

TL;DR

Abstract

Paper Structure (23 sections, 7 equations, 6 figures, 4 tables)

This paper contains 23 sections, 7 equations, 6 figures, 4 tables.

Introduction
Notation Terminology
Framework
Meta Reweighting Module
Contrastive Learning Module
Prior Knowledge of Contrastive Learning
Weight dependent Contrastive Learning
Overall Objective Function
Experiment
Datasets and Setup
Baseline Methods
Main Results
Analysis
Meta Reweighting Module Analysis
Contrastive Learning Analysis
...and 8 more sections

Figures (6)

Figure 1: A high-level view of the overall framework.
Figure 2: Approximated bilevel optimization procedure for meta reweighting module $\mathcal{A}$: ① utilize raw instances $(\mathbf{X}_{Task}, \mathbf{Y}_{Task})$ and augmented instances $(\hat{\mathbf{X}}, \hat{\mathbf{Y}})$ to do the forward pass for $L_{Task}$; ② backward pass for $L_{Task}$ on $\mathcal{M}$, retaining the backward propagation graph for $\mathcal{A}$; ③ update main module $\mathcal{M}$ to $\mathcal{M}^*$ through backward propagation; ④ utilize the raw meta input $(\mathbf{X}_{Meta}, \mathbf{Y}_{Meta})$ forward pass for $L_{Meta}$; ⑤ backward pass for $L_{Meta}$ to update meta reweighting module $\mathcal{A}$.
Figure 3: The structure of contrastive learning module.
Figure 4: Augmented samples weight distribution from meta reweighting module with basic encoder RoBERTa-base.
Figure 5: Hyperparameter (HP) analysis of Contrastive Learning on RoBERTa-base encoder.
...and 1 more figures

Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

TL;DR

Abstract

Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)