NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh
TL;DR
NEO addresses distribution shift in vision transformers by re-centering test-time embeddings at the origin using a global centroid estimate, an approach grounded in neural-collapse geometry. It is hyperparameter-free, optimization-free, and incurs negligible add-on compute, achieving higher accuracy and better calibration across multiple datasets (ImageNet-C, CIFAR-10-C, ImageNet-R, ImageNet-Sketch) and ViT sizes, including strong edge-device performance. A simple replacement of the final Linear layer with a lightweight NEO mechanism enables adaptation from as few as 1 sample or 1 class, with a continual variant for evolving shifts. Together, these results advance practical, resources-efficient TTA and provide insight into latent-space structure under domain shift.
Abstract
Test-Time Adaptation (TTA) methods are often computationally expensive, require a large amount of data for effective adaptation, or are brittle to hyperparameters. Based on a theoretical foundation of the geometry of the latent space, we are able to significantly improve the alignment between source and distribution-shifted samples by re-centering target data embeddings at the origin. This insight motivates NEO -- a hyperparameter-free fully TTA method, that adds no significant compute compared to vanilla inference. NEO is able to improve the classification accuracy of ViT-Base on ImageNet-C from 55.6% to 59.2% after adapting on just one batch of 64 samples. When adapting on 512 samples NEO beats all 7 TTA methods we compare against on ImageNet-C, ImageNet-R and ImageNet-S and beats 6/7 on CIFAR-10-C, while using the least amount of compute. NEO performs well on model calibration metrics and additionally is able to adapt from 1 class to improve accuracy on 999 other classes in ImageNet-C. On Raspberry Pi and Jetson Orin Nano devices, NEO reduces inference time by 63% and memory usage by 9% compared to baselines. Our results based on 3 ViT architectures and 4 datasets show that NEO can be used efficiently and effectively for TTA.
