Preserving Silent Features for Domain Generalization
Chujie Zhao, Tianren Zhang, Feng Chen
TL;DR
DG seeks robust generalization to unseen domains. The authors identify a feature-suppression effect where self-supervised silent features are downweighted during supervised fine-tuning, potentially harming DG performance. They model this with a Gaussian DG framework and show that preserving silent features can lower the test risk $R_ ext{test}$ under certain conditions, motivating STEP, which combines LP-FT and SWAD to retain silent features during training. Empirically, STEP-S achieves state-of-the-art or near-state-of-the-art results on five standard DG benchmarks, especially under large distribution shifts, and is compatible with existing DG methods to further improve generalization.
Abstract
Domain generalization (DG) aims to improve the generalization ability of the model trained on several known training domains over unseen test domains. Previous work has shown that self-supervised contrastive pre-training improves the robustness of the model on downstream tasks. However, in this paper, we find that self-supervised models do not exhibit better generalization performance than supervised models pre-trained on the same dataset in the DG setting. We argue that this is owing to the fact that the richer intra-class discriminative features extracted by self-supervised contrastive learning, which we term silent features, are suppressed during supervised fine-tuning. These silent features are likely to contain features that are more generalizable on the test domain. In this work, we model and analyze this feature suppression phenomenon and theoretically prove that preserving silent features can achieve lower expected test domain risk under certain conditions. In light of this, we propose a simple yet effective method termed STEP (Silent Feature Preservation) to improve the generalization performance of the self-supervised contrastive learning pre-trained model by alleviating the suppression of silent features during the supervised fine-tuning process. Experimental results show that STEP exhibits state-of-the-art performance on standard DG benchmarks with significant distribution shifts.
