Table of Contents
Fetching ...

GridMask Data Augmentation

Pengguang Chen, Shu Liu, Hengshuang Zhao, Xingquan Wang, Jiaya Jia

TL;DR

GridMask introduces a simple, structured information-dropping augmentation that masks a grid of squares in input images to balance deletion and information preservation. The method, governed by parameters r, d, delta_x, and delta_y, consistently improves performance across ImageNet, COCO, and Cityscapes, often outperforming more complex policies like AutoAugment with far lower computational cost. Ablation studies validate the choice of hyperparameters and the importance of structured dropping over random occlusion. The technique demonstrates strong cross-task generalization and can serve as a strong baseline policy for future augmentation searches. Overall, GridMask provides a practical, scalable, and effective augmentation strategy with broad applicability in computer vision.

Abstract

We propose a novel data augmentation method `GridMask' in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks. We analyze the requirement of information dropping. Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective. It is based on the deletion of regions of the input image. Our extensive experiments show that our method outperforms the latest AutoAugment, which is way more computationally expensive due to the use of reinforcement learning to find the best policies. On the ImageNet dataset for recognition, COCO2017 object detection, and on Cityscapes dataset for semantic segmentation, our method all notably improves performance over baselines. The extensive experiments manifest the effectiveness and generality of the new method.

GridMask Data Augmentation

TL;DR

GridMask introduces a simple, structured information-dropping augmentation that masks a grid of squares in input images to balance deletion and information preservation. The method, governed by parameters r, d, delta_x, and delta_y, consistently improves performance across ImageNet, COCO, and Cityscapes, often outperforming more complex policies like AutoAugment with far lower computational cost. Ablation studies validate the choice of hyperparameters and the importance of structured dropping over random occlusion. The technique demonstrates strong cross-task generalization and can serve as a strong baseline policy for future augmentation searches. Overall, GridMask provides a practical, scalable, and effective augmentation strategy with broad applicability in computer vision.

Abstract

We propose a novel data augmentation method `GridMask' in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks. We analyze the requirement of information dropping. Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective. It is based on the deletion of regions of the input image. Our extensive experiments show that our method outperforms the latest AutoAugment, which is way more computationally expensive due to the use of reinforcement learning to find the best policies. On the ImageNet dataset for recognition, COCO2017 object detection, and on Cityscapes dataset for semantic segmentation, our method all notably improves performance over baselines. The extensive experiments manifest the effectiveness and generality of the new method.

Paper Structure

This paper contains 25 sections, 6 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Unsuccessful examples by previous strategies.
  • Figure 2: More examples of different information dropping methods (best view in large size).
  • Figure 3: This image shows examples of GridMask. First, we produce a mask according to the given parameters ($r$, $d$, $\delta_x$, $\delta_y$). Then we multiply it with the input image. The result is shown in the last row. In the mask, gray value is 1, representing the reserved regions; black value is 0, for regions to be deleted.
  • Figure 4: The dotted square shows one unit of the mask.
  • Figure 5: Statistics of failure cases with increasing of the size of dropping squares (lower probability is better). The $x$-axis shows the range of the size of one removal unit. Our method has a much lower failure probability statistically with a slower increasing trend.
  • ...and 2 more figures