AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee, Sorn Chottananurak, Taesik Gong, Sung-Ju Lee
TL;DR
Test-time adaptation (TTA) enables models to cope with domain shifts using unlabeled test data, but practical deployment is hampered by adaptation failures and the lack of ground-truth labels for monitoring performance. The authors introduce AETTA, a label-free accuracy estimator that leverages prediction disagreement between the adapted model and dropout inferences (PDD) to estimate post-adaptation accuracy, and they extend this with robust disagreement calibration to handle failures. The approach is theoretically grounded through disagreement-equality results and is empirically validated across CIFAR10-C, CIFAR100-C, and ImageNet-C against multiple TTA methods, achieving an average improvement of 19.8 percentage points in estimation accuracy. AETTA is shown to enable effective model monitoring and practical recovery strategies in dynamic, unlabeled test streams, with a public code release to encourage adoption.
Abstract
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, such as requiring labeled data or re-training models. To address this issue, we propose AETTA, a label-free accuracy estimation algorithm for TTA. We propose the prediction disagreement as the accuracy estimate, calculated by comparing the target model prediction with dropout inferences. We then improve the prediction disagreement to extend the applicability of AETTA under adaptation failures. Our extensive evaluation with four baselines and six TTA methods demonstrates that AETTA shows an average of 19.8%p more accurate estimation compared with the baselines. We further demonstrate the effectiveness of accuracy estimation with a model recovery case study, showcasing the practicality of our model recovery based on accuracy estimation. The source code is available at https://github.com/taeckyung/AETTA.
