Unsupervised Parameter Efficient Source-free Post-pretraining
Abhishek Jha, Tinne Tuytelaars, Yuki M. Asano
TL;DR
UpStep tackles unsupervised, source-free domain adaptation for large vision models by post-pretraining pretrained backbones with a self-supervised two-stream clustering objective, augmented with center-vector regularization to curb forgetting and a gating mechanism to skip ineffective updates. Parameter efficiency is achieved via LoRA applied to QKV projections, and performance is further boosted by an ensemble of base and UpStep features during evaluation. Across eight diverse target domains and multiple base architectures, UpStep achieves competitive or superior representations while dramatically reducing trainable parameters and training time. The approach demonstrates robust generalization and offers a practical path for adapting foundation models to varied visual domains without access to source data or labels.
Abstract
Following the success in NLP, the best vision models are now in the billion parameter ranges. Adapting these large models to a target distribution has become computationally and economically prohibitive. Addressing this challenge, we introduce UpStep, an Unsupervised Parameter-efficient Source-free post-pretraining approach, designed to efficiently adapt a base model from a source domain to a target domain: i) we design a self-supervised training scheme to adapt a pretrained model on an unlabeled target domain in a setting where source domain data is unavailable. Such source-free setting comes with the risk of catastrophic forgetting, hence, ii) we propose center vector regularization (CVR), a set of auxiliary operations that minimize catastrophic forgetting and additionally reduces the computational cost by skipping backpropagation in 50\% of the training iterations. Finally iii) we perform this adaptation process in a parameter-efficient way by adapting the pretrained model through low-rank adaptation methods, resulting in a fraction of parameters to optimize. We utilize various general backbone architectures, both supervised and unsupervised, trained on Imagenet as our base model and adapt them to a diverse set of eight target domains demonstrating the adaptability and generalizability of our proposed approach.
