Table of Contents
Fetching ...

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Xiangyu Wu, Dongming Jiang, Feng Yu, Yueying Tian, Jiaqi Tang, Qing-Guo Chen, Yang Yang, Jianfeng Lu

TL;DR

This work tackles bias in test-time adaptation for vision-language models like CLIP by moving beyond Shannon Entropy to Tsallis Entropy, introducing a non-extensive parameter $q$ to better handle biased prediction distributions. It further proposes Adaptive Debiasing Tsallis Entropy (ADTE), which assigns class-specific exponents $q^l$ learned from streaming test data via a memory-based bias estimation, enabling bias-aware high-confidence view selection and seamless logit adjustment. The approach unifies entropy-based uncertainty with label calibration, yielding state-of-the-art results on ImageNet and 10 cross-domain benchmarks across CLIP backbones and prompt types. The results demonstrate that ADTE provides robust, distribution-aware test-time adaptation without distribution-specific hyperparameter tuning, offering practical gains for deployment in diverse domains.

Abstract

Mainstream Test-Time Adaptation (TTA) methods for adapting vision-language models, e.g., CLIP, typically rely on Shannon Entropy (SE) at test time to measure prediction uncertainty and inconsistency. However, since CLIP has a built-in bias from pretraining on highly imbalanced web-crawled data, SE inevitably results in producing biased estimates of uncertainty entropy. To address this issue, we notably find and demonstrate that Tsallis Entropy (TE), a generalized form of SE, is naturally suited for characterizing biased distributions by introducing a non-extensive parameter q, with the performance of SE serving as a lower bound for TE. Building upon this, we generalize TE into Adaptive Debiasing Tsallis Entropy (ADTE) for TTA, customizing a class-specific parameter q^l derived by normalizing the estimated label bias from continuously incoming test instances, for each category. This adaptive approach allows ADTE to accurately select high-confidence views and seamlessly integrate with a label adjustment strategy to enhance adaptation, without introducing distribution-specific hyperparameter tuning. Besides, our investigation reveals that both TE and ADTE can serve as direct, advanced alternatives to SE in TTA, without any other modifications. Experimental results show that ADTE outperforms state-of-the-art methods on ImageNet and its five variants, and achieves the highest average performance on 10 cross-domain benchmarks, regardless of the model architecture or text prompts used. Our code is available at https://github.com/Jinx630/ADTE.

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

TL;DR

This work tackles bias in test-time adaptation for vision-language models like CLIP by moving beyond Shannon Entropy to Tsallis Entropy, introducing a non-extensive parameter to better handle biased prediction distributions. It further proposes Adaptive Debiasing Tsallis Entropy (ADTE), which assigns class-specific exponents learned from streaming test data via a memory-based bias estimation, enabling bias-aware high-confidence view selection and seamless logit adjustment. The approach unifies entropy-based uncertainty with label calibration, yielding state-of-the-art results on ImageNet and 10 cross-domain benchmarks across CLIP backbones and prompt types. The results demonstrate that ADTE provides robust, distribution-aware test-time adaptation without distribution-specific hyperparameter tuning, offering practical gains for deployment in diverse domains.

Abstract

Mainstream Test-Time Adaptation (TTA) methods for adapting vision-language models, e.g., CLIP, typically rely on Shannon Entropy (SE) at test time to measure prediction uncertainty and inconsistency. However, since CLIP has a built-in bias from pretraining on highly imbalanced web-crawled data, SE inevitably results in producing biased estimates of uncertainty entropy. To address this issue, we notably find and demonstrate that Tsallis Entropy (TE), a generalized form of SE, is naturally suited for characterizing biased distributions by introducing a non-extensive parameter q, with the performance of SE serving as a lower bound for TE. Building upon this, we generalize TE into Adaptive Debiasing Tsallis Entropy (ADTE) for TTA, customizing a class-specific parameter q^l derived by normalizing the estimated label bias from continuously incoming test instances, for each category. This adaptive approach allows ADTE to accurately select high-confidence views and seamlessly integrate with a label adjustment strategy to enhance adaptation, without introducing distribution-specific hyperparameter tuning. Besides, our investigation reveals that both TE and ADTE can serve as direct, advanced alternatives to SE in TTA, without any other modifications. Experimental results show that ADTE outperforms state-of-the-art methods on ImageNet and its five variants, and achieves the highest average performance on 10 cross-domain benchmarks, regardless of the model architecture or text prompts used. Our code is available at https://github.com/Jinx630/ADTE.
Paper Structure (18 sections, 38 equations, 7 figures, 24 tables, 2 algorithms)

This paper contains 18 sections, 38 equations, 7 figures, 24 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) VLM bias, showing higher confidence and accuracy for head classes and lower confidence and accuracy for tail classes. (b) The standard Shannon Entropy (SE)-based method is widely used in TTA. (c) and (d) Our proposed method, which uses Tsallis Entropy (TE) and Adaptive Debiasing Tsallis Entropy (ADTE) for selecting high-confidence views.
  • Figure 2: Comparison between SE and TE.
  • Figure 3: TE at different $q$ values vs. SE; the red dashed line marks the optimal $q$ of TE.
  • Figure 4: Different number of views.
  • Figure 5: Computational cost and effect of different intervals.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1