Preserving Silent Features for Domain Generalization

Chujie Zhao; Tianren Zhang; Feng Chen

Preserving Silent Features for Domain Generalization

Chujie Zhao, Tianren Zhang, Feng Chen

TL;DR

DG seeks robust generalization to unseen domains. The authors identify a feature-suppression effect where self-supervised silent features are downweighted during supervised fine-tuning, potentially harming DG performance. They model this with a Gaussian DG framework and show that preserving silent features can lower the test risk $R_ ext{test}$ under certain conditions, motivating STEP, which combines LP-FT and SWAD to retain silent features during training. Empirically, STEP-S achieves state-of-the-art or near-state-of-the-art results on five standard DG benchmarks, especially under large distribution shifts, and is compatible with existing DG methods to further improve generalization.

Abstract

Domain generalization (DG) aims to improve the generalization ability of the model trained on several known training domains over unseen test domains. Previous work has shown that self-supervised contrastive pre-training improves the robustness of the model on downstream tasks. However, in this paper, we find that self-supervised models do not exhibit better generalization performance than supervised models pre-trained on the same dataset in the DG setting. We argue that this is owing to the fact that the richer intra-class discriminative features extracted by self-supervised contrastive learning, which we term silent features, are suppressed during supervised fine-tuning. These silent features are likely to contain features that are more generalizable on the test domain. In this work, we model and analyze this feature suppression phenomenon and theoretically prove that preserving silent features can achieve lower expected test domain risk under certain conditions. In light of this, we propose a simple yet effective method termed STEP (Silent Feature Preservation) to improve the generalization performance of the self-supervised contrastive learning pre-trained model by alleviating the suppression of silent features during the supervised fine-tuning process. Experimental results show that STEP exhibits state-of-the-art performance on standard DG benchmarks with significant distribution shifts.

Preserving Silent Features for Domain Generalization

TL;DR

under certain conditions, motivating STEP, which combines LP-FT and SWAD to retain silent features during training. Empirically, STEP-S achieves state-of-the-art or near-state-of-the-art results on five standard DG benchmarks, especially under large distribution shifts, and is compatible with existing DG methods to further improve generalization.

Abstract

Paper Structure (30 sections, 4 theorems, 23 equations, 3 figures, 13 tables)

This paper contains 30 sections, 4 theorems, 23 equations, 3 figures, 13 tables.

Introduction
Related Work
Problem Formulation
The Benefits of Preserving Silent Features
Method
Motivation
Silent Feature Preservation
Linear Probing then fine-tuning (LP-FT)
Stochastic Weight Averaging Densely (SWAD)
Experiment
Experimental Settings
Main Experiment Results
Ablation Study
Conclustion
Proofs and Discussion on Theoretical Results
...and 15 more sections

Key Result

Theorem 1

Assume that $\eta = \frac{1}{2}$ and $\sigma_d^2=\sigma_s^2=\sigma^2$. Then, for any $w_d,w_s\in[0,1]$, the predictor $g^*\circ\Phi_{(w_d,w_s)}$, composed of featurizer $\Phi_{(w_d,w_s)}$ and training-domain Bayes classifier $g^*$ with respect to $\Phi_{(w_d,w_s)}$, gives the expected test domain ri where $F$ is the CDF of a standard Gaussian $\mathcal{N}(0,1)$.

Figures (3)

Figure 1: Examples of the features used to recognize baseball and basketball in DomainNet Peng_2019_ICCV_domainnet, dominant features are highlighted in red, whereas silent features are gray.
Figure 2: The causal graph of our data generation model in training and test domains. Shading represents that the variable is observed. Solid lines represent the data generation process, while dashed lines represent feature suppression.
Figure 3: Example of the correlation between OOD generalization performance and the relative weight of silent features, images are from the PACS dataset Li_2017_ICCV_pacs. The source domain is the "photo" domain in yellow, while the target domain is the "sketch" domain in gray-blue, which distinguishes well in the texture and shape dimensions respectively. The diagonal line represents the ideal decision boundary for the dog and the elephant, the blue dashed line is the feature space given by mixing the two feature dimensions, and the red line is the corresponding empirical decision boundary. It is evident that the decision boundary fitted by the Self-supervised contrastive learning pre-trained model on the "photo" domain is closer to the ideal situation and generalizes better on the inaccessible "sketch" domain.

Theorems & Definitions (8)

Definition 1: Feature suppression
Theorem 1: Expected test domain risk
Lemma 1
proof
Lemma 2
proof
Theorem 2: Expected test domain risk, generalized
proof

Preserving Silent Features for Domain Generalization

TL;DR

Abstract

Preserving Silent Features for Domain Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)