Table of Contents
Fetching ...

Unsupervised Parameter Efficient Source-free Post-pretraining

Abhishek Jha, Tinne Tuytelaars, Yuki M. Asano

TL;DR

UpStep tackles unsupervised, source-free domain adaptation for large vision models by post-pretraining pretrained backbones with a self-supervised two-stream clustering objective, augmented with center-vector regularization to curb forgetting and a gating mechanism to skip ineffective updates. Parameter efficiency is achieved via LoRA applied to QKV projections, and performance is further boosted by an ensemble of base and UpStep features during evaluation. Across eight diverse target domains and multiple base architectures, UpStep achieves competitive or superior representations while dramatically reducing trainable parameters and training time. The approach demonstrates robust generalization and offers a practical path for adapting foundation models to varied visual domains without access to source data or labels.

Abstract

Following the success in NLP, the best vision models are now in the billion parameter ranges. Adapting these large models to a target distribution has become computationally and economically prohibitive. Addressing this challenge, we introduce UpStep, an Unsupervised Parameter-efficient Source-free post-pretraining approach, designed to efficiently adapt a base model from a source domain to a target domain: i) we design a self-supervised training scheme to adapt a pretrained model on an unlabeled target domain in a setting where source domain data is unavailable. Such source-free setting comes with the risk of catastrophic forgetting, hence, ii) we propose center vector regularization (CVR), a set of auxiliary operations that minimize catastrophic forgetting and additionally reduces the computational cost by skipping backpropagation in 50\% of the training iterations. Finally iii) we perform this adaptation process in a parameter-efficient way by adapting the pretrained model through low-rank adaptation methods, resulting in a fraction of parameters to optimize. We utilize various general backbone architectures, both supervised and unsupervised, trained on Imagenet as our base model and adapt them to a diverse set of eight target domains demonstrating the adaptability and generalizability of our proposed approach.

Unsupervised Parameter Efficient Source-free Post-pretraining

TL;DR

UpStep tackles unsupervised, source-free domain adaptation for large vision models by post-pretraining pretrained backbones with a self-supervised two-stream clustering objective, augmented with center-vector regularization to curb forgetting and a gating mechanism to skip ineffective updates. Parameter efficiency is achieved via LoRA applied to QKV projections, and performance is further boosted by an ensemble of base and UpStep features during evaluation. Across eight diverse target domains and multiple base architectures, UpStep achieves competitive or superior representations while dramatically reducing trainable parameters and training time. The approach demonstrates robust generalization and offers a practical path for adapting foundation models to varied visual domains without access to source data or labels.

Abstract

Following the success in NLP, the best vision models are now in the billion parameter ranges. Adapting these large models to a target distribution has become computationally and economically prohibitive. Addressing this challenge, we introduce UpStep, an Unsupervised Parameter-efficient Source-free post-pretraining approach, designed to efficiently adapt a base model from a source domain to a target domain: i) we design a self-supervised training scheme to adapt a pretrained model on an unlabeled target domain in a setting where source domain data is unavailable. Such source-free setting comes with the risk of catastrophic forgetting, hence, ii) we propose center vector regularization (CVR), a set of auxiliary operations that minimize catastrophic forgetting and additionally reduces the computational cost by skipping backpropagation in 50\% of the training iterations. Finally iii) we perform this adaptation process in a parameter-efficient way by adapting the pretrained model through low-rank adaptation methods, resulting in a fraction of parameters to optimize. We utilize various general backbone architectures, both supervised and unsupervised, trained on Imagenet as our base model and adapt them to a diverse set of eight target domains demonstrating the adaptability and generalizability of our proposed approach.

Paper Structure

This paper contains 20 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Proposed UpStep architecture: During training, we train the pass the augmented view of the input images through the online and offline streams. These online and offline are identical in architecture consisting of LoRA adapted pretrained Base models. The encoded representations are projected to the prototype space where an online clustering loss is applied. We apply an auxiliary loss, learning rate regularization and a gating mechanism to skip training for certain iterations conditioned upon the magnitude of center vector, as shown by the shaded region, Center vector regularization. For each layer in ViT model, we adapt the QKV matrices and the projection layers. During evaluation, we only use the LoRA adapted target domain base network in ensemble with the source domain base model.
  • Figure 2: Impact of Center Vector Magnitude on Model Performance. Higher center vector magnitudes correlate with reduced k-NN accuracy for the majority of the datasets, underscoring the stabilizing role of center vector regularization.
  • Figure 3: Impact of Center Vector Regularization on Catastrophic Forgetting. Bar plots shows the difference in accuracy between the ensembled model, and un-ensembled model. Red bars are corresponding to UpStep model with center vector (CV) regularization, while the Blue bars represent the ablated version of UpStep without the CV regularization.
  • Figure 4: Effect of Number of Prototypes on Model Performance. Performance of the model with varying numbers of prototypes in the non-ensemble setting, on Flowers102 dataset nilsback2008automated.
  • Figure 5: Training Time Reduction with Center Vector Conditional Training. (a) Average reduction in training time across datasets. (b) Performance comparison between UpStep with and without center vector (CV cond) conditioned gating. With a comparable performance over the dataset, Upstep with CV conditioned gating reduces the number of training iterations (backpropagation) to 50%.