Table of Contents
Fetching ...

Test-Time Adaptation with Binary Feedback

Taeckyung Lee, Sorn Chottananurak, Junsu Kim, Jinwoo Shin, Taesik Gong, Sung-Ju Lee

TL;DR

This work tackles the problem of deep models degrading under domain shifts by introducing Test-Time Adaptation with Binary Feedback (TTA-BF). It proposes BiTTA, a dual-path reinforcement-learning framework that combines Binary Feedback-guided Adaptation (BFA) on uncertain samples with Agreement-Based self-Adaptation (ABA) on confident ones, using MC-dropout to estimate uncertainty and guide sample selection. The method optimizes a joint objective via policy gradients and memory-based updates, achieving substantial gains (up to 13.3 percentage points) over state-of-the-art TTA baselines while requiring only a small amount of binary feedback. BiTTA demonstrates robust performance under severe distribution shifts with minimal labeling effort, highlighting the practical value of sparse human feedback for real-time adaptation in dynamic environments.

Abstract

Deep learning models perform poorly when domain shifts exist between training and test data. Test-time adaptation (TTA) is a paradigm to mitigate this issue by adapting pre-trained models using only unlabeled test samples. However, existing TTA methods can fail under severe domain shifts, while recent active TTA approaches requiring full-class labels are impractical due to high labeling costs. To address this issue, we introduce a new setting of TTA with binary feedback. This setting uses a few binary feedback inputs from annotators to indicate whether model predictions are correct, thereby significantly reducing the labeling burden of annotators. Under the setting, we propose BiTTA, a novel dual-path optimization framework that leverages reinforcement learning to balance binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions. Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines, demonstrating its effectiveness in handling severe distribution shifts with minimal labeling effort. The source code is available at https://github.com/taeckyung/BiTTA.

Test-Time Adaptation with Binary Feedback

TL;DR

This work tackles the problem of deep models degrading under domain shifts by introducing Test-Time Adaptation with Binary Feedback (TTA-BF). It proposes BiTTA, a dual-path reinforcement-learning framework that combines Binary Feedback-guided Adaptation (BFA) on uncertain samples with Agreement-Based self-Adaptation (ABA) on confident ones, using MC-dropout to estimate uncertainty and guide sample selection. The method optimizes a joint objective via policy gradients and memory-based updates, achieving substantial gains (up to 13.3 percentage points) over state-of-the-art TTA baselines while requiring only a small amount of binary feedback. BiTTA demonstrates robust performance under severe distribution shifts with minimal labeling effort, highlighting the practical value of sparse human feedback for real-time adaptation in dynamic environments.

Abstract

Deep learning models perform poorly when domain shifts exist between training and test data. Test-time adaptation (TTA) is a paradigm to mitigate this issue by adapting pre-trained models using only unlabeled test samples. However, existing TTA methods can fail under severe domain shifts, while recent active TTA approaches requiring full-class labels are impractical due to high labeling costs. To address this issue, we introduce a new setting of TTA with binary feedback. This setting uses a few binary feedback inputs from annotators to indicate whether model predictions are correct, thereby significantly reducing the labeling burden of annotators. Under the setting, we propose BiTTA, a novel dual-path optimization framework that leverages reinforcement learning to balance binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions. Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines, demonstrating its effectiveness in handling severe distribution shifts with minimal labeling effort. The source code is available at https://github.com/taeckyung/BiTTA.

Paper Structure

This paper contains 66 sections, 12 equations, 13 figures, 14 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of TTA with binary feedback. Traditional TTA algorithms often fail under severe distribution shifts due to the risk of unlabeled-only adaptation. Our proposed TTA with binary feedback addresses this challenge by offering a few binary feedback (correct or incorrect) on selected model predictions. TTA with binary feedback significantly improves the adaptation performance with minimal labeling effort, enabling a practical and scalable TTA paradigm for real-world applications.
  • Figure 2: Accuracy (%) of TTA methods with binary feedback on CIFAR10-C. The asterisk indicates a modified algorithm to utilize binary feedback. The dotted line is full-class active TTA (SimATTA).
  • Figure 3: Overview of BiTTA algorithm. BiTTA implements a reinforcement learning-based dual-path optimization that estimates prediction probabilities using MC-dropout. It computes policy gradients from two complementary signals: (1) Binary Feedback-guided Adaptation (BFA) on uncertain samples, using binary rewards of $\pm 1$, and (2) Agreement-Based self-Adaptation (ABA) on confident, unlabeled samples, using reward 1. By jointly optimizing both paths, BiTTA enables robust adaptation under dynamic distribution shift scenarios.
  • Figure 4: Analysis of confidence and accuracy during online adaptation. (a) Average sample-wise confidence over time and dataset, showing dynamic changes that challenge fixed thresholding methods. (b) Average sample-wise accuracy for samples with prediction agreement and disagreement on CIFAR10-C, demonstrating the effectiveness of agreement-based selection for confident samples.
  • Figure 5: Accuracy (%) with full-class feedback (SimATTA) and binary-feedback (BiTTA) and under the equal total labeling cost. GPT-4o is used as a foundational model to provide a full-class label.
  • ...and 8 more figures