Domain Generalization Guided by Large-Scale Pre-Trained Priors

Zongbin Wang; Bin Pan; Shiyu Shen; Tianyang Shi; Zhenwei Shi

Domain Generalization Guided by Large-Scale Pre-Trained Priors

Zongbin Wang, Bin Pan, Shiyu Shen, Tianyang Shi, Zhenwei Shi

TL;DR

This work addresses domain generalization by insisting that fine-tuning continually references a large-scale pre-trained prior. It formulates a PAC-Bayes-based objective (FT-LP) that adds a KL-divergence term to tether the trainable posterior to a pre-trained prior, and provides an encoder-based, MAP-style implementation to apply this when only pre-trained weights are available. Theoretical results establish bounds showing potential reductions in target-domain error, while empirical evaluations on DomainBed benchmarks demonstrate consistent improvements across diverse DG algorithms and datasets. The approach highlights the practical value of leveraging pre-trained priors throughout optimization to improve robustness to domain shifts, and points to future work on extending priors and reducing computational overhead.

Abstract

Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to maintain this ability, it could further enhance the generalization ability of the DG model. For this purpose, we introduce a new method called Fine-Tune with Large-scale pre-trained Priors (FT-LP), which incorporates the pre-trained model as a prior into the DG fine-tuning process, ensuring that the model refers to its pre-trained model at each optimization step. FT-LP comprises a theoretical framework and a simple implementation strategy. In theory, we verify the rationality of FT-LP by introducing a generalization error bound with the pre-trained priors for DG. In implementation, we utilize an encoder to simulate the model distribution, enabling the use of FT-LP when only pre-trained weights are available. In summary, we offer a new fine-tuning method for DG algorithms to utilize pre-trained models throughout the fine-tuning process. Through experiments on various datasets and DG models, our proposed method exhibits significant improvements, indicating its effectiveness.

Domain Generalization Guided by Large-Scale Pre-Trained Priors

TL;DR

Abstract

Paper Structure (27 sections, 35 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 27 sections, 35 equations, 7 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Fine-Tuning with Large-scaled pre-trained Priors
Problem setting
Tightening DG generalization bounds with large-scaled pre-trained priors
Implementation for FT-LP
Experienments
Experiment setups and implementation details
Main results
Visualization experiment
Discussion and Limitation
Conclusion
Some Discussion on Theoretical Details
The validity of $dist(S,T,\mathcal{H})$
An example for Proposition 1 and 2
...and 12 more sections

Figures (7)

Figure 1: Figure (a) shows the standard DG fine-tuning algorithm, where the pre-trained model $Q$ is directly fine-tuned to $P_N$. Figure (b) shows our method, which includes $Q$ at every step of fine-tuning. Figure (c) shows the details: we regard $Q$ as the prior for $P$ and adjust $P$ based on the discrepancy between $P$ and $Q$, resulting in $P'$ for further fine-tuning.
Figure 2: In the image, each pair of columns displays visualization results for different testing domains. Each column represents a specific model. Redder areas in the image indicate where the model focuses more on classification.
Figure 3: The horizontal and vertical axes represent $x_{inv}$ and $x_{sup}$ respectively, different shapes represent different categories, and various colors indicate samples used for training different classifiers. The shaded regions depict the confidence intervals generated by prior covariance. For clarity, we omit the confidence intervals of the posterior. The classifier that approaches vertical alignment is preferred.
Figure 4: art painting
Figure 5: cartoon
...and 2 more figures

Domain Generalization Guided by Large-Scale Pre-Trained Priors

TL;DR

Abstract

Domain Generalization Guided by Large-Scale Pre-Trained Priors

Authors

TL;DR

Abstract

Table of Contents

Figures (7)