Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion
Juhwan Choi, YoungBin Kim
TL;DR
This paper introduces a lightweight rule-based data augmentation method that deletes adverbs to preserve sentence semantics. The augmentation computes $\hat{x} = x \setminus W_{adv}$ with adverbs $W_{adv}$ identified via POS tagging, implemented using spaCy and applied to diverse tasks including NLI. Empirical results show improved accuracy across text classification and, notably, natural language inference, outperforming baselines like EDA, AEDA, and softEDA, with further gains when used alongside curriculum data augmentation. The approach offers a low-cost, semantically faithful augmentation strategy with potential extensions to additional tasks and languages, and the authors provide public code for reproducibility.
Abstract
In the field of text data augmentation, rule-based methods are widely adopted for real-world applications owing to their cost-efficiency. However, conventional rule-based approaches suffer from the possibility of losing the original semantics of the given text. We propose a novel text data augmentation strategy that avoids such phenomena through a straightforward deletion of adverbs, which play a subsidiary role in the sentence. Our comprehensive experiments demonstrate the efficiency and effectiveness of our proposed approach for not just single text classification, but also natural language inference that requires semantic preservation. We publicly released our source code for reproducibility.
