Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

Himmet Toprak Kesgin; Mehmet Fatih Amasyali

Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

Himmet Toprak Kesgin, Mehmet Fatih Amasyali

TL;DR

This work addresses the limited exploration of data augmentation in NLP by introducing Iterative Mask Filling (IMF), a method that uses a masked language model (Fill-Mask) to iteratively mask and replace words in sentences. IMF leverages top-$k$ predictions and probabilistic sampling, with an augmentation-intensity parameter, to generate diverse yet label-preserving sentences, and it can be filtered by low-loss samples to reduce label noise. Across News, New York, FinSent, and TwitSent datasets, IMF yields notable improvements on category classification tasks, especially with small training sets, though gains on sentiment analysis are less consistent. The study also analyzes trade-offs between language-model size and augmentation speed, and demonstrates that loss-based filtering can substantially boost performance, guiding practical deployment choices for text augmentation pipelines.

Abstract

Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with language model predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.

Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

TL;DR

predictions and probabilistic sampling, with an augmentation-intensity parameter, to generate diverse yet label-preserving sentences, and it can be filtered by low-loss samples to reduce label noise. Across News, New York, FinSent, and TwitSent datasets, IMF yields notable improvements on category classification tasks, especially with small training sets, though gains on sentiment analysis are less consistent. The study also analyzes trade-offs between language-model size and augmentation speed, and demonstrates that loss-based filtering can substantially boost performance, guiding practical deployment choices for text augmentation pipelines.

Abstract

Paper Structure (11 sections, 3 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 3 figures, 3 tables, 1 algorithm.

Introduction
Literature Review
Methods
Experiments
Datasets
Training Settings
Comparison of Text Augmentations
Improving Performance of Text Augmentations
Using Different Language Models
Discussion and Limitations
Conclusions

Figures (3)

Figure 1: Example Sentence Augmentation
Figure 2: Training Set Size - Test Set Accuracy Analysis
Figure 3: Real and Augmented Sentences TSNE Representations

Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

TL;DR

Abstract

Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)