Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
Jiajun Hu, Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao
TL;DR
This work tackles domain generalization by leveraging large pre-trained vision transformers without full fine-tuning. It introduces Parameter-Efficient Group with Orthogonal Regularization (PEGO), which injects a group of LoRA modules and imposes two orthogonal losses to both preserve the pre-trained feature space and diversify learned representations, yielding improved out-of-distribution generalization. The method achieves state-of-the-art results on five DomainBed DG benchmarks with minimal additional trainable parameters and no extra testing cost, and its analyses validate the benefits of the preservation and diversification strategies as well as weight-space orthogonality. PEGO demonstrates that carefully regularized, parameter-efficient fine-tuning of foundation models can substantially enhance domain generalization in vision tasks while remaining resource-efficient and scalable to larger backbones.
Abstract
Domain generalization (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability and showing promising direction for solving the DG problem. However, fully Fine-Tuning (FT) the foundation models results in unsatisfactory out-of-distribution accuracy due to the destroyed pre-trained generalized features. Recently, Parameter-Efficient Fine-Tuning (PEFT) alleviates the above problem by fine-tuning a small portion of the model parameters while keeping the rest frozen, which achieves better generalization performance compared to FT. Nevertheless, PEFT still suffers from the issue of overfitting to the training domains. To address the above issue, we propose Parameter-Efficient Group with Orthogonal regularization (PEGO) for vision transformers, which effectively preserves the generalization ability of the pre-trained network and learns more diverse knowledge compared with conventional PEFT. Specifically, we inject a group of trainable Low-Rank Adaptation (LoRA) modules into the pre-trained model and propose an orthogonal regularization loss to enhance the generalization ability of the model. Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
