GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts
Sameer Ambekar, Zehao Xiao, Xiantong Zhen, Cees G. M. Snoek
TL;DR
The paper tackles test-time domain generalization where target distributions are unseen during training. It introduces GeneralizeFormer, a lightweight transformer that generates target-specific BN affine parameters and classifier weights for each batch, conditioned on source-trained weights, target features, and layer-wise gradients, while keeping convolutional weights fixed to curb computation. Trained via a meta-learning scheme that simulates distribution shifts, the approach enables on-the-fly adaptation without backpropagation on the backbone, and demonstrates strong performance across six domain-generalization benchmarks, including dynamic and multi-target scenarios. The results show improved robustness to various distribution shifts, reduced forgetting of source information, and faster inference than fine-tuning-based methods, highlighting the practicality of per-batch, layer-aware model generation for deployment under distribution shift.
Abstract
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training. Different from the common methods that fine-tune the model or adjust the classifier parameters online, we propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call \textit{GeneralizeFormer}. The layer-wise parameters are generated per target batch without fine-tuning or online adjustment. By doing so, our method is more effective in dynamic scenarios with multiple target distributions and also avoids forgetting valuable source distribution characteristics. Moreover, by considering layer-wise gradients, the proposed method adapts itself to various distribution shifts. To reduce the computational and time cost, we fix the convolutional parameters while only generating parameters of the Batch Normalization layers and the linear classifier. Experiments on six widely used domain generalization datasets demonstrate the benefits and abilities of the proposed method to efficiently handle various distribution shifts, generalize in dynamic scenarios, and avoid forgetting.
