Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment
WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar
TL;DR
The paper tackles Online Test-Time Adaptation (OTTA) under distributional shifts without access to source data, addressing the shortcomings of entropy minimization (EM) through a cosine-based alignment strategy. It introduces CoMM (Cosine Max-Min), a dual-objective loss that maximizes the cosine similarity between target features and the predicted class weight while suppressing alignment with non-predicted classes, with the core expression $\mathcal{L}_{CoMM} = -\frac{1}{N}\sum_{i=1}^{N}\log{\frac{\cos(\theta_{\omega_{\hat{c}_i}, z_i})}{\sum_j \cos(\theta_{\omega_j, z_i})}}$. The approach formalizes the OTTA problem setup, analyzes EM's gradient ambiguities, and demonstrates CoMM's superiority through extensive experiments on CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet, including ablations that confirm the necessity of the dual-objective design. Results show consistent, state-of-the-art improvements across corruptions and domain shifts, with robust performance at varying batch sizes. Overall, CoMM provides a practical and effective OTTA solution that robustly aligns target representations with source classifier weights, enhancing both precision and adaptability in real-world deployment.
Abstract
Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.
