Table of Contents
Fetching ...

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Jian Liang, Ran He, Tieniu Tan

TL;DR

This survey compiles and unifies three related test-time adaptation paradigms—test-time domain adaptation, test-time batch adaptation, and online test-time adaptation—under a single framework that adapts pre-trained models to unlabeled test data before inference. It presents a structured taxonomy of TTDA methods (pseudo-labeling, consistency training, clustering, source distribution estimation, self-supervised learning), TTBA strategies (BN calibration, model optimization, meta-learning, input adaptation, dynamic inference), and OTTA techniques (BN calibration, entropy minimization, pseudo-labeling, consistency, anti-forgetting), and discusses learning scenarios, applications, and practical considerations. The paper surveys a wide range of applications (image classification, semantic segmentation, object detection, NLP, graph data, video, and beyond), evaluates protocol nuances, and outlines emerging trends such as memory-efficient continual adaptation and the influence of foundation models. It identifies open theoretical questions, benchmarking needs, and trustworthiness concerns to guide future research and deployment in real-world, non-i.i.d. environments. Overall, the work advances understanding of how unlabeled test data can be leveraged to sustain robust performance under distribution shifts, with implications for safety, privacy, and scalability in deployed AI systems.

Abstract

Machine learning methods strive to acquire a robust model during the training process that can effectively generalize to test samples, even in the presence of distribution shifts. However, these methods often suffer from performance degradation due to unknown test distributions. Test-time adaptation (TTA), an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm has highlighted the significant benefits of using unlabeled data to train self-adapted models prior to inference. In this survey, we categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation. For each category, we provide a comprehensive taxonomy of advanced algorithms and discuss various learning scenarios. Furthermore, we analyze relevant applications of TTA and discuss open challenges and promising areas for future research. For a comprehensive list of TTA methods, kindly refer to \url{https://github.com/tim-learn/awesome-test-time-adaptation}.

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

TL;DR

This survey compiles and unifies three related test-time adaptation paradigms—test-time domain adaptation, test-time batch adaptation, and online test-time adaptation—under a single framework that adapts pre-trained models to unlabeled test data before inference. It presents a structured taxonomy of TTDA methods (pseudo-labeling, consistency training, clustering, source distribution estimation, self-supervised learning), TTBA strategies (BN calibration, model optimization, meta-learning, input adaptation, dynamic inference), and OTTA techniques (BN calibration, entropy minimization, pseudo-labeling, consistency, anti-forgetting), and discusses learning scenarios, applications, and practical considerations. The paper surveys a wide range of applications (image classification, semantic segmentation, object detection, NLP, graph data, video, and beyond), evaluates protocol nuances, and outlines emerging trends such as memory-efficient continual adaptation and the influence of foundation models. It identifies open theoretical questions, benchmarking needs, and trustworthiness concerns to guide future research and deployment in real-world, non-i.i.d. environments. Overall, the work advances understanding of how unlabeled test data can be leveraged to sustain robust performance under distribution shifts, with implications for safety, privacy, and scalability in deployed AI systems.

Abstract

Machine learning methods strive to acquire a robust model during the training process that can effectively generalize to test samples, even in the presence of distribution shifts. However, these methods often suffer from performance degradation due to unknown test distributions. Test-time adaptation (TTA), an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm has highlighted the significant benefits of using unlabeled data to train self-adapted models prior to inference. In this survey, we categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation. For each category, we provide a comprehensive taxonomy of advanced algorithms and discuss various learning scenarios. Furthermore, we analyze relevant applications of TTA and discuss open challenges and promising areas for future research. For a comprehensive list of TTA methods, kindly refer to \url{https://github.com/tim-learn/awesome-test-time-adaptation}.
Paper Structure (47 sections, 28 equations, 5 figures, 3 tables)

This paper contains 47 sections, 28 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The test-time adaptation (TTA) paradigm aims to adapt the pre-trained model to various types of unlabeled test data, including single mini-batch in (a), streaming data in (b), or an entire dataset in (c), before making predictions. During the adaptation process, either the model or the input data can be altered to improve performance against distribution shifts. The dotted green arrow indicates the test-time training phase before inference, while the blue arrow denotes pure inference.
  • Figure 2: Three representative types of pseudo-labeling, where $\theta$ represents the model parameters, and $\hat{y}_t$ (or $\bar{y}_t$) denotes the pseudo label of the instance $x_t$.
  • Figure 3: Three representative types of consistency training, where $\hat{x}_t$ represents the data variant of $x_t$, and $\theta_A$ (or $\theta_B$ and $\theta_{tea}$) denotes the model variant of $\theta$.
  • Figure 4: Two representative types of clustering-based training, where similarity is obtained based on a feature memory bank.
  • Figure 5: Three representative types of source distribution estimation, where surrogate source data is obtained through generation, translation, and selection, respectively.

Theorems & Definitions (5)

  • definition thmcounterdefinition: Domain
  • definition thmcounterdefinition: Test-Time Domain Adaptation, TTDA
  • definition thmcounterdefinition: Test-Time Instance Adaptation, TTIA
  • definition thmcounterdefinition: Test-Time Batch Adaptation, TTBA
  • definition thmcounterdefinition: Online Test-Time Adaptation, OTTA