Table of Contents
Fetching ...

Decoupled Prototype Learning for Reliable Test-Time Adaptation

Guowei Wang, Changxing Ding, Wentao Tan, Mingkui Tan

TL;DR

This work tackles test-time adaptation under label-noise from pseudo-labels by introducing Decoupled Prototype Learning (DPL), a prototype-centric optimization that updates class prototypes independently rather than fitting each noisy pseudo-label. It further strengthens robustness with a memory bank of pseudo-features and a consistency regularization that leverages unconfident samples via AdaIN-style feature-style transfer. Empirical results on domain generalization and image corruption benchmarks demonstrate state-of-the-art performance and improved stability, including robustness to small batch sizes and compatibility with existing self-training methods. The findings highlight the value of decoupled, prototype-level learning for reliable TTA in diverse and challenging domain shifts, with code to be released.

Abstract

Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.

Decoupled Prototype Learning for Reliable Test-Time Adaptation

TL;DR

This work tackles test-time adaptation under label-noise from pseudo-labels by introducing Decoupled Prototype Learning (DPL), a prototype-centric optimization that updates class prototypes independently rather than fitting each noisy pseudo-label. It further strengthens robustness with a memory bank of pseudo-features and a consistency regularization that leverages unconfident samples via AdaIN-style feature-style transfer. Empirical results on domain generalization and image corruption benchmarks demonstrate state-of-the-art performance and improved stability, including robustness to small batch sizes and compatibility with existing self-training methods. The findings highlight the value of decoupled, prototype-level learning for reliable TTA in diverse and challenging domain shifts, with code to be released.

Abstract

Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.
Paper Structure (14 sections, 10 equations, 5 figures, 8 tables)

This paper contains 14 sections, 10 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Comparisons of test-time cumulative accuracy between the cross-entropy (CE) loss and the decoupled prototype learning (DPL) method during test-time adaptation on the "cartoon" domain and the "sketch" domain of the PACS database li2017deeper. First, the source model is trained on the remaining source domains using empirical risk minimization. Then, we adapt the source model to the target domain at test time using online data (59 iterations for "cartoon" and 99 iterations for "sketch").
  • Figure 2: (a) T-SNE visualization of feature spaces on the "cartoon" (row 1) and "sketch" (row 2) domains of the PACS database li2017deeper for the source model, the model optimized by the CE loss, and our DPL, respectively. DPL enables the model to learn disentangled features for the seven categories. (b) The comparison of predictions made by CE and DPL with the ground truth (i.e., "GT"). The vertical axis presents the number of samples according to the GT or pseudo-labels. The horizontal axis represents the categories. The predictions made by DPL are more consistent with GT than those by the CE loss. Best viewed in color.
  • Figure 3: (a) T-SNE visualization of style features (ResNet's stage-1 outputs). Source domain features are colored in orange, green, and violet. Target domain features are colored in red (Confident samples $X^A$) and cyan (Unconfident samples $X^B$). (b) Illustration of the way to utilize samples with unconfident pseudo-labels. We identify samples with confident and unconfident pseudo-labels in the first pass, transfer feature styles from unconfident ones to confident ones in the second pass, and utilize both samples with original and transfer feature styles in our DPL loss.
  • Figure 4: Performance comparison between TENT wang2020tent, PL lee2013pseudo, TSD wang2023feature, DPL$^{*}$, and DPL on the four DG benchmarks with different batch sizes.
  • Figure 5: Performance of DPL with different hyper-parameter values, including (1) the temperature $\tau$, (2) the momentum coefficient $\eta$, (3) the trade-off parameter $\beta$, and (4) the range of updated parameters. The solid and dashed lines in (d) stand for the performance of DPL and PL, respectively.