IT$^3$: Idempotent Test-Time Training
Nikita Durasov, Assaf Shocher, Doruk Oner, Gal Chechik, Alexei A. Efros, Pascal Fua
TL;DR
IT$^3$ introduces Idempotent Test-Time Training to adapt to distribution shifts on-the-fly using only the current test input. By enforcing an idempotence-based objective and using a frozen (or EMA) anchor during test-time updates, the method replaces domain-specific auxiliary tasks with a universal regularizer that pulls OOD representations toward the training distribution. The approach yields consistent improvements across diverse tasks (image classification, segmentation, age prediction, aerodynamics) and architectures (MLPs, CNNs, GNNs), including large-scale ImageNet-C, while maintaining practical inference costs. The results reveal a strong link between idempotence and prediction confidence, suggesting idempotence as a general principle for robust test-time adaptation with broad real-world impact.
Abstract
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data. While existing approaches like domain adaptation and test-time training (TTT) offer partial solutions, they typically require additional data or domain-specific auxiliary tasks. We present Idempotent Test-Time Training (IT$^3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance, without any auxiliary task design. Our key insight is that enforcing idempotence -- where repeated applications of a function yield the same result -- can effectively replace domain-specific auxiliary tasks used in previous TTT methods. We theoretically connect idempotence to prediction confidence and demonstrate that minimizing the distance between successive applications of our model during inference leads to improved out-of-distribution performance. Extensive experiments across diverse domains (including image classification, aerodynamics prediction, and aerial segmentation) and architectures (MLPs, CNNs, GNNs) show that IT$^3$ consistently outperforms existing approaches while being simpler and more widely applicable. Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
