CBM: Curriculum by Masking

Andrei Jarca; Florinel-Alin Croitoru; Radu Tudor Ionescu

CBM: Curriculum by Masking

Andrei Jarca, Florinel-Alin Croitoru, Radu Tudor Ionescu

TL;DR

CBM tackles data-efficiency and accuracy in vision tasks by introducing a salience-guided patch-masking curriculum. It defines a curriculum vector $\mathbf{r} \\in \\mathbb{R}^N$ and patches per image $n$, with a maximum masking ratio $r_N$, and masks patches according to per-patch probabilities $p_i = m_i / \sum_j m_j$ derived from gradient magnitudes $m_i$. The method supports multiple schedules (linear, log, exp, linear-repeat) and is architecture-agnostic, delivering state-of-the-art results on CIFAR-10/100, ImageNet, Food-101, and PASCAL VOC, often surpassing prior CL methods with statistical significance ($p$-value $=0.001$). Ablation confirms the contributions of gradient-guided masking and curriculum dynamics, and analyses show CBM’s robustness to hyperparameter choices and synergy with other augmentations like CutMix. Overall, CBM provides a practical, generalizable approach to curriculum learning that improves both accuracy and transfer capability across recognition and detection tasks, with open-source code available for reuse and extension.

Abstract

We propose Curriculum by Masking (CBM), a novel state-of-the-art curriculum learning strategy that effectively creates an easy-to-hard training schedule via patch (token) masking, offering significant accuracy improvements over the conventional training regime and previous curriculum learning (CL) methods. CBM leverages gradient magnitudes to prioritize the masking of salient image regions via a novel masking algorithm and a novel masking block. Our approach enables controlling sample difficulty via the patch masking ratio, generating an effective easy-to-hard curriculum by gradually introducing harder samples as training progresses. CBM operates with two easily configurable parameters, i.e. the number of patches and the curriculum schedule, making it a versatile curriculum learning approach for object recognition and detection. We conduct experiments with various neural architectures, ranging from convolutional networks to vision transformers, on five benchmark data sets (CIFAR-10, CIFAR-100, ImageNet, Food-101 and PASCAL VOC), to compare CBM with conventional as well as curriculum-based training regimes. Our results reveal the superiority of our strategy compared with the state-of-the-art curriculum learning regimes. We also observe improvements in transfer learning contexts, where CBM surpasses previous work by considerable margins in terms of accuracy. We release our code for free non-commercial use at https://github.com/CroitoruAlin/CBM.

CBM: Curriculum by Masking

TL;DR

CBM tackles data-efficiency and accuracy in vision tasks by introducing a salience-guided patch-masking curriculum. It defines a curriculum vector

and patches per image

, with a maximum masking ratio

, and masks patches according to per-patch probabilities

derived from gradient magnitudes

. The method supports multiple schedules (linear, log, exp, linear-repeat) and is architecture-agnostic, delivering state-of-the-art results on CIFAR-10/100, ImageNet, Food-101, and PASCAL VOC, often surpassing prior CL methods with statistical significance (

-value

). Ablation confirms the contributions of gradient-guided masking and curriculum dynamics, and analyses show CBM’s robustness to hyperparameter choices and synergy with other augmentations like CutMix. Overall, CBM provides a practical, generalizable approach to curriculum learning that improves both accuracy and transfer capability across recognition and detection tasks, with open-source code available for reuse and extension.

Abstract

Paper Structure (12 sections, 8 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 12 sections, 8 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Method
Experiments
Experimental Setup
Results
Ablation Results
Additional Results
Conclusion
Supplementary
Additional Qualitative Results
Comparison with CL-MAE

Figures (7)

Figure 1: An overview of Curriculum by Masking. The training starts with fully visible images. During training, the patch masking ratio is gradually increased to make the samples more difficult. The masking is predominantly focused on the more salient regions (with higher gradient magnitudes), to reduce the likelihood of producing easier images by masking the background information. Best viewed in color.
Figure 2: The proposed curriculum schedules are based on masking a certain number of image patches in each epoch. For illustration purposes, the number of epochs is $N=200$ and the maximum masking ratio is $r_N=0.5$ for all schedules. Best viewed in color.
Figure 3: Varying the number of masking patches of the linear repeat schedule, for CvT-13 on CIFAR-100. There are multiple configurations that surpass the baseline.
Figure 4: Varying the maximum masking ratio of the linear repeat schedule, for CvT-13 on CIFAR-100. All hyperparameter choices outperform the baseline.
Figure 5: Varying the number of repetitions of the linear repeat schedule, for CvT-13 on CIFAR-100. All hyperparameter values lead to better results than the baseline.
...and 2 more figures

CBM: Curriculum by Masking

TL;DR

Abstract

CBM: Curriculum by Masking

Authors

TL;DR

Abstract

Table of Contents

Figures (7)