Ranked Entropy Minimization for Continual Test-Time Adaptation
Jisu Han, Jaemin Na, Wonjun Hwang
TL;DR
This work tackles instability in continual test-time adaptation (CTTA) by identifying entropy minimization (EM) collapse as a key problem and proposing Ranked Entropy Minimization (REM). REM introduces explicit mask chaining guided by vision-transformer attention to progressively increase input difficulty, paired with a masked-consistency loss $\mathcal{L}_{MCL}$ and an entropy-ranking loss $\mathcal{L}_{ERL}$, combined as $\mathcal{L}_{REM}=\mathcal{L}_{MCL}+\lambda\mathcal{L}_{ERL}$ to preserve prediction diversity while adapting online. Across ImageNetC, CIFAR10C, and CIFAR100C CTTA benchmarks, REM delivers state-of-the-art or competitive accuracy with the efficiency of entropy-based methods and without requiring student-teacher ensembles, as evidenced by significant gains over source models and previous CTTA methods. The approach also demonstrates improved calibration and robust performance under various domain shifts, supporting its practical applicability for real-time, resource-constrained deployment. Overall, REM offers a principled, efficient pathway to stable continual test-time adaptation by jointly regulating prediction difficulty and entropy through a single, masked transformer model.
Abstract
Test-time adaptation aims to adapt to realistic environments in an online manner by learning during test time. Entropy minimization has emerged as a principal strategy for test-time adaptation due to its efficiency and adaptability. Nevertheless, it remains underexplored in continual test-time adaptation, where stability is more important. We observe that the entropy minimization method often suffers from model collapse, where the model converges to predicting a single class for all images due to a trivial solution. We propose ranked entropy minimization to mitigate the stability problem of the entropy minimization method and extend its applicability to continuous scenarios. Our approach explicitly structures the prediction difficulty through a progressive masking strategy. Specifically, it gradually aligns the model's probability distributions across different levels of prediction difficulty while preserving the rank order of entropy. The proposed method is extensively evaluated across various benchmarks, demonstrating its effectiveness through empirical results. Our code is available at https://github.com/pilsHan/rem
