Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

Juhwan Choi; YoungBin Kim

Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

Juhwan Choi, YoungBin Kim

TL;DR

This paper introduces a lightweight rule-based data augmentation method that deletes adverbs to preserve sentence semantics. The augmentation computes $\hat{x} = x \setminus W_{adv}$ with adverbs $W_{adv}$ identified via POS tagging, implemented using spaCy and applied to diverse tasks including NLI. Empirical results show improved accuracy across text classification and, notably, natural language inference, outperforming baselines like EDA, AEDA, and softEDA, with further gains when used alongside curriculum data augmentation. The approach offers a low-cost, semantically faithful augmentation strategy with potential extensions to additional tasks and languages, and the authors provide public code for reproducibility.

Abstract

In the field of text data augmentation, rule-based methods are widely adopted for real-world applications owing to their cost-efficiency. However, conventional rule-based approaches suffer from the possibility of losing the original semantics of the given text. We propose a novel text data augmentation strategy that avoids such phenomena through a straightforward deletion of adverbs, which play a subsidiary role in the sentence. Our comprehensive experiments demonstrate the efficiency and effectiveness of our proposed approach for not just single text classification, but also natural language inference that requires semantic preservation. We publicly released our source code for reproducibility.

Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

TL;DR

This paper introduces a lightweight rule-based data augmentation method that deletes adverbs to preserve sentence semantics. The augmentation computes

with adverbs

identified via POS tagging, implemented using spaCy and applied to diverse tasks including NLI. Empirical results show improved accuracy across text classification and, notably, natural language inference, outperforming baselines like EDA, AEDA, and softEDA, with further gains when used alongside curriculum data augmentation. The approach offers a low-cost, semantically faithful augmentation strategy with potential extensions to additional tasks and languages, and the authors provide public code for reproducibility.

Abstract

Paper Structure (7 sections, 1 equation, 3 tables)

This paper contains 7 sections, 1 equation, 3 tables.

Introduction
Method
Experiment
Conclusion
Implementation Details
Case Analysis and Discussion
Dataset Specifications

Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

TL;DR

Abstract

Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion

Authors

TL;DR

Abstract

Table of Contents