A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
Jian Liang, Ran He, Tieniu Tan
TL;DR
This survey compiles and unifies three related test-time adaptation paradigms—test-time domain adaptation, test-time batch adaptation, and online test-time adaptation—under a single framework that adapts pre-trained models to unlabeled test data before inference. It presents a structured taxonomy of TTDA methods (pseudo-labeling, consistency training, clustering, source distribution estimation, self-supervised learning), TTBA strategies (BN calibration, model optimization, meta-learning, input adaptation, dynamic inference), and OTTA techniques (BN calibration, entropy minimization, pseudo-labeling, consistency, anti-forgetting), and discusses learning scenarios, applications, and practical considerations. The paper surveys a wide range of applications (image classification, semantic segmentation, object detection, NLP, graph data, video, and beyond), evaluates protocol nuances, and outlines emerging trends such as memory-efficient continual adaptation and the influence of foundation models. It identifies open theoretical questions, benchmarking needs, and trustworthiness concerns to guide future research and deployment in real-world, non-i.i.d. environments. Overall, the work advances understanding of how unlabeled test data can be leveraged to sustain robust performance under distribution shifts, with implications for safety, privacy, and scalability in deployed AI systems.
Abstract
Machine learning methods strive to acquire a robust model during the training process that can effectively generalize to test samples, even in the presence of distribution shifts. However, these methods often suffer from performance degradation due to unknown test distributions. Test-time adaptation (TTA), an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm has highlighted the significant benefits of using unlabeled data to train self-adapted models prior to inference. In this survey, we categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation. For each category, we provide a comprehensive taxonomy of advanced algorithms and discuss various learning scenarios. Furthermore, we analyze relevant applications of TTA and discuss open challenges and promising areas for future research. For a comprehensive list of TTA methods, kindly refer to \url{https://github.com/tim-learn/awesome-test-time-adaptation}.
