Table of Contents
Fetching ...

Open-World Test-Time Adaptation with Hierarchical Feature Aggregation and Attention Affine

Ziqiong Liu, Yushun Tang, Junyang Ji, Zhihai He

TL;DR

This paper tackles open-world test-time adaptation by addressing OOD samples that can mislead online updates. It introduces a Hierarchical Ladder Network (HLN) that aggregates OOD cues from class tokens across all Transformer layers and combines its predictions with the base model through weighted probability fusion, enhancing OOD detection without sacrificing ID accuracy. To handle domain drift, an Attention Affine Network (AAN) adaptively refines the self-attention via affine transformations of the QKV projections, complemented by a patch-wise similarity loss to stabilize domain adaptation. A self-weighted entropy mechanism downweights uncertain samples, and a joint objective balances entropy minimization, OOD discrimination, and inter-patch consistency during test-time updates. Across ImageNet-based benchmarks (including ImageNet-C, ImageNet-R, and ImageNet-A), the method consistently surpasses prior TTA methods, demonstrating strong robustness to open-world distribution shifts and practical applicability for real-world deployment.

Abstract

Test-time adaptation (TTA) refers to adjusting the model during the testing phase to cope with changes in sample distribution and enhance the model's adaptability to new environments. In real-world scenarios, models often encounter samples from unseen (out-of-distribution, OOD) categories. Misclassifying these as known (in-distribution, ID) classes not only degrades predictive accuracy but can also impair the adaptation process, leading to further errors on subsequent ID samples. Many existing TTA methods suffer substantial performance drops under such conditions. To address this challenge, we propose a Hierarchical Ladder Network that extracts OOD features from class tokens aggregated across all Transformer layers. OOD detection performance is enhanced by combining the original model prediction with the output of the Hierarchical Ladder Network (HLN) via weighted probability fusion. To improve robustness under domain shift, we further introduce an Attention Affine Network (AAN) that adaptively refines the self-attention mechanism conditioned on the token information to better adapt to domain drift, thereby improving the classification performance of the model on datasets with domain shift. Additionally, a weighted entropy mechanism is employed to dynamically suppress the influence of low-confidence samples during adaptation. Experimental results on benchmark datasets show that our method significantly improves the performance on the most widely used classification datasets.

Open-World Test-Time Adaptation with Hierarchical Feature Aggregation and Attention Affine

TL;DR

This paper tackles open-world test-time adaptation by addressing OOD samples that can mislead online updates. It introduces a Hierarchical Ladder Network (HLN) that aggregates OOD cues from class tokens across all Transformer layers and combines its predictions with the base model through weighted probability fusion, enhancing OOD detection without sacrificing ID accuracy. To handle domain drift, an Attention Affine Network (AAN) adaptively refines the self-attention via affine transformations of the QKV projections, complemented by a patch-wise similarity loss to stabilize domain adaptation. A self-weighted entropy mechanism downweights uncertain samples, and a joint objective balances entropy minimization, OOD discrimination, and inter-patch consistency during test-time updates. Across ImageNet-based benchmarks (including ImageNet-C, ImageNet-R, and ImageNet-A), the method consistently surpasses prior TTA methods, demonstrating strong robustness to open-world distribution shifts and practical applicability for real-world deployment.

Abstract

Test-time adaptation (TTA) refers to adjusting the model during the testing phase to cope with changes in sample distribution and enhance the model's adaptability to new environments. In real-world scenarios, models often encounter samples from unseen (out-of-distribution, OOD) categories. Misclassifying these as known (in-distribution, ID) classes not only degrades predictive accuracy but can also impair the adaptation process, leading to further errors on subsequent ID samples. Many existing TTA methods suffer substantial performance drops under such conditions. To address this challenge, we propose a Hierarchical Ladder Network that extracts OOD features from class tokens aggregated across all Transformer layers. OOD detection performance is enhanced by combining the original model prediction with the output of the Hierarchical Ladder Network (HLN) via weighted probability fusion. To improve robustness under domain shift, we further introduce an Attention Affine Network (AAN) that adaptively refines the self-attention mechanism conditioned on the token information to better adapt to domain drift, thereby improving the classification performance of the model on datasets with domain shift. Additionally, a weighted entropy mechanism is employed to dynamically suppress the influence of low-confidence samples during adaptation. Experimental results on benchmark datasets show that our method significantly improves the performance on the most widely used classification datasets.

Paper Structure

This paper contains 24 sections, 16 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Diagram of our method. An entropy-based filter is applied to identify high-confidence ID and OOD samples. The ID samples are used to update the Attention Affine Network for domain shift adaptation, while the OOD samples are used to update the Hierarchical Ladder Network for improved OOD detection.
  • Figure 2: Overview of our proposed Hierarchical Feature Aggregation and Attention Affine.
  • Figure 3: T-SNE visualization of the Class Tokens from different layers.
  • Figure 4: AUROC curve of stamp and our methods with the first 4000 samples and last 4000 samples during adpation.
  • Figure 5: Comparison of the impact of different ID coefficient hyperparameters on AUC scores.
  • ...and 3 more figures