Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time Learning
Chandler Timm C. Doloriel
TL;DR
This work tackles the problem of distribution shifts at test time by proposing Mask to Adapt (M2A), a simple continual test-time adaptation method that uses a short sequence of randomly masked views and two losses to drive online adaptation without labels. By evaluating spatial (patch-based) and frequency masking within a masked image modelling-inspired framework, the authors show that a straightforward random masking schedule, when coupled with mask-consistency and entropy minimization, can outperform more complex masking strategies and state-of-the-art CTTA baselines on CIFAR10C/CIFAR100C and remain competitive on ImageNetC. Key contributions include a thorough analysis of masking types, robust ablations, and demonstrations that simple randomness can provide a stable adaptation curriculum that avoids reliance on calibrated uncertainty or attention signals. Practically, M2A offers an efficient, easy-to-implement CTTA approach with strong robustness to corruptions and favorable computational characteristics.
Abstract
Distribution shifts at test time degrade image classifiers. Recent continual test-time adaptation (CTTA) methods use masking to regulate learning, but often depend on calibrated uncertainty or stable attention scores and introduce added complexity. We ask: do we need custom-made masking designs, or can a simple random masking schedule suffice under strong corruption? We introduce Mask to Adapt (M2A), a simple CTTA approach that generates a short sequence of masked views (spatial or frequency) and adapts with two objectives: a mask consistency loss that aligns predictions across different views and an entropy minimization loss that encourages confident outputs. Motivated by masked image modeling, we study two common masking families -- spatial masking and frequency masking -- and further compare subtypes within each (spatial: patch vs.\ pixel; frequency: all vs.\ low vs.\ high). On CIFAR10C/CIFAR100C/ImageNetC (severity~5), M2A (Spatial) attains 8.3\%/19.8\%/39.2\% mean error, outperforming or matching strong CTTA baselines, while M2A (Frequency) lags behind. Ablations further show that simple random masking is effective and robust. These results indicate that a simple random masking schedule, coupled with consistency and entropy objectives, is sufficient to drive effective test-time adaptation without relying on uncertainty or attention signals.
