Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

WeiQin Chuah; Ruwan Tennakoon; Alireza Bab-Hadiashar

Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar

TL;DR

The paper tackles Online Test-Time Adaptation (OTTA) under distributional shifts without access to source data, addressing the shortcomings of entropy minimization (EM) through a cosine-based alignment strategy. It introduces CoMM (Cosine Max-Min), a dual-objective loss that maximizes the cosine similarity between target features and the predicted class weight while suppressing alignment with non-predicted classes, with the core expression $\mathcal{L}_{CoMM} = -\frac{1}{N}\sum_{i=1}^{N}\log{\frac{\cos(\theta_{\omega_{\hat{c}_i}, z_i})}{\sum_j \cos(\theta_{\omega_j, z_i})}}$. The approach formalizes the OTTA problem setup, analyzes EM's gradient ambiguities, and demonstrates CoMM's superiority through extensive experiments on CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet, including ablations that confirm the necessity of the dual-objective design. Results show consistent, state-of-the-art improvements across corruptions and domain shifts, with robust performance at varying batch sizes. Overall, CoMM provides a practical and effective OTTA solution that robustly aligns target representations with source classifier weights, enhancing both precision and adaptability in real-world deployment.

Abstract

Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.

Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

TL;DR

. The approach formalizes the OTTA problem setup, analyzes EM's gradient ambiguities, and demonstrates CoMM's superiority through extensive experiments on CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet, including ablations that confirm the necessity of the dual-objective design. Results show consistent, state-of-the-art improvements across corruptions and domain shifts, with robust performance at varying batch sizes. Overall, CoMM provides a practical and effective OTTA solution that robustly aligns target representations with source classifier weights, enhancing both precision and adaptability in real-world deployment.

Abstract

Paper Structure (13 sections, 5 equations, 6 figures, 8 tables)

This paper contains 13 sections, 5 equations, 6 figures, 8 tables.

Introduction
Related Work
Proposed Method
Problem Setup
Revisiting the Limitations of Entropy Minimization
Cosine Alignment Optimization
Experimental Results
Robustness to Corruptions
Domain Adaptation beyond Image Corruption Shifts
Discussion
Conclusions
Comprehensive Experimental Results Comparison on ImageNet-C across all five levels of severity.
Additional Qualitative Analysis

Figures (6)

Figure 1: Analytical comparison between entropy minimization (EM) and our proposed method (CoMM) in terms of optimization. Figure (a) illustrates the EM loss function within the probability simplex for a three-class toy problem, including optimization trajectories. For initialization within specific regions (highlighted in gray), EM optimization steers predictions towards a lower-entropy area. However, this process results in at least two logits having positive gradients, causing the model to align with multiple classes and leading to confusion. Figure (b) demonstrates EM's limitations using CIFAR-10-C dataset examples: The top row shows EM increasing logits for both the correct class 6 (frog) and an incorrect class 3 (cat), leading to ambiguity. Similarly, the bottom row indicates EM raising logits for the correct class 3 (cat) and incorrectly for classes 0 (airplane) and 1 (automobile). In contrast, our proposed CoMM method shows a more targeted reduction in entropy, indicative of decisive classification.
Figure 1:
Figure 2: Distribution of entropy (left) and cosine similarity (right) for correctly (green) and incorrectly (red) classified out-of-domain samples in CIFAR-10-C across all corruption types at severity level 5. These distributions provide insights into the discriminative power of the model's predictions, reflecting the impact of data corruption on prediction certainty and feature-class alignment.
Figure 3: The above histograms show cosine similarity distributions of maximally aligned class for correct (green) and incorrect (red) predictions on the CIFAR-10-C dataset, under four different corruption types. A trend is observable where correct predictions are associated with higher cosine similarity scores, while incorrect predictions are more uniformly distributed across the lower range of cosine similarity values. This pattern indicates a strong correlation between higher cosine similarity and prediction accuracy, illustrating the effectiveness of cosine similarity as a measure of correctness amidst data distributional shift, as also suggested in Figure \ref{['fig:motivation']} (right).
Figure 4: Visualization of optimization trajectories on the probability simplex for various online test-time adaptation methods, namely entropy minimization (EM), hard pseudo label (PL), CoM and CoMM loss functions. The color gradient indicates entropy levels, with darker regions corresponding to lower entropy.
...and 1 more figures

Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

TL;DR

Abstract

Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (6)