Table of Contents
Fetching ...

Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization

Jiajun Hu, Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

TL;DR

This work tackles domain generalization by leveraging large pre-trained vision transformers without full fine-tuning. It introduces Parameter-Efficient Group with Orthogonal Regularization (PEGO), which injects a group of LoRA modules and imposes two orthogonal losses to both preserve the pre-trained feature space and diversify learned representations, yielding improved out-of-distribution generalization. The method achieves state-of-the-art results on five DomainBed DG benchmarks with minimal additional trainable parameters and no extra testing cost, and its analyses validate the benefits of the preservation and diversification strategies as well as weight-space orthogonality. PEGO demonstrates that carefully regularized, parameter-efficient fine-tuning of foundation models can substantially enhance domain generalization in vision tasks while remaining resource-efficient and scalable to larger backbones.

Abstract

Domain generalization (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability and showing promising direction for solving the DG problem. However, fully Fine-Tuning (FT) the foundation models results in unsatisfactory out-of-distribution accuracy due to the destroyed pre-trained generalized features. Recently, Parameter-Efficient Fine-Tuning (PEFT) alleviates the above problem by fine-tuning a small portion of the model parameters while keeping the rest frozen, which achieves better generalization performance compared to FT. Nevertheless, PEFT still suffers from the issue of overfitting to the training domains. To address the above issue, we propose Parameter-Efficient Group with Orthogonal regularization (PEGO) for vision transformers, which effectively preserves the generalization ability of the pre-trained network and learns more diverse knowledge compared with conventional PEFT. Specifically, we inject a group of trainable Low-Rank Adaptation (LoRA) modules into the pre-trained model and propose an orthogonal regularization loss to enhance the generalization ability of the model. Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.

Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization

TL;DR

This work tackles domain generalization by leveraging large pre-trained vision transformers without full fine-tuning. It introduces Parameter-Efficient Group with Orthogonal Regularization (PEGO), which injects a group of LoRA modules and imposes two orthogonal losses to both preserve the pre-trained feature space and diversify learned representations, yielding improved out-of-distribution generalization. The method achieves state-of-the-art results on five DomainBed DG benchmarks with minimal additional trainable parameters and no extra testing cost, and its analyses validate the benefits of the preservation and diversification strategies as well as weight-space orthogonality. PEGO demonstrates that carefully regularized, parameter-efficient fine-tuning of foundation models can substantially enhance domain generalization in vision tasks while remaining resource-efficient and scalable to larger backbones.

Abstract

Domain generalization (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability and showing promising direction for solving the DG problem. However, fully Fine-Tuning (FT) the foundation models results in unsatisfactory out-of-distribution accuracy due to the destroyed pre-trained generalized features. Recently, Parameter-Efficient Fine-Tuning (PEFT) alleviates the above problem by fine-tuning a small portion of the model parameters while keeping the rest frozen, which achieves better generalization performance compared to FT. Nevertheless, PEFT still suffers from the issue of overfitting to the training domains. To address the above issue, we propose Parameter-Efficient Group with Orthogonal regularization (PEGO) for vision transformers, which effectively preserves the generalization ability of the pre-trained network and learns more diverse knowledge compared with conventional PEFT. Specifically, we inject a group of trainable Low-Rank Adaptation (LoRA) modules into the pre-trained model and propose an orthogonal regularization loss to enhance the generalization ability of the model. Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
Paper Structure (28 sections, 10 equations, 4 figures, 14 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 4 figures, 14 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of our method: Parameter Efficient Group with Orthogonal regularization (PEGO). Different from previous DG work updating all the parameters of the pre-trained model, we freeze the parameters of the model and inject a group of trainable parameter-efficient modules into it. Moreover, we apply an orthogonal regularization loss between the pre-trained weights and the LoRA modules to preserve the generalization ability of the pre-trained model (Learn to Preserve) and employ another orthogonal regularization loss on different LoRA modules within the group to encourage them to learn diverse knowledge during training (Learn to Diversify).
  • Figure 2: Leave-one-domain-out accuracy (%) on PACS and OfficeHome when choosing different numbers of LoRA modules $N$, balancing coefficient $\alpha$ and rank of LoRA $r$. Baseline (blue line) indicates injecting a group of LoRA layers into the pre-trained model without applying $\mathcal{L}_{preserve}$ and $\mathcal{L}_{diversify}$ (i.e., balancing coefficient is zero).
  • Figure 3: The visualization of the feature space (before the classifier) extracted by FT model, pre-trained model, LoRA model, and our model when training the PACS dataset and the test domain is art painting.
  • Figure 4: Left: Explained Variance Ratio of the top-10 PCs in LoRA weight and the top-10 PCs in PEGO weight. Right: Cosine similarity between the top-10 PCs of pre-trained weight and the top-8 PCs of PEGO weight.