Table of Contents
Fetching ...

Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

Na Wang, Lei Qi, Jintao Guo, Yinghuan Shi, Yang Gao

TL;DR

This paper focuses on improving the generalization ability of the model by compelling it to acquire domain-invariant representations from both the sample and feature perspectives by disentangling spurious correlations and enhancing potential correlations.

Abstract

Domain generalization (DG) intends to train a model on multiple source domains to ensure that it can generalize well to an arbitrary unseen target domain. The acquisition of domain-invariant representations is pivotal for DG as they possess the ability to capture the inherent semantic information of the data, mitigate the influence of domain shift, and enhance the generalization capability of the model. Adopting multiple perspectives, such as the sample and the feature, proves to be effective. The sample perspective facilitates data augmentation through data manipulation techniques, whereas the feature perspective enables the extraction of meaningful generalization features. In this paper, we focus on improving the generalization ability of the model by compelling it to acquire domain-invariant representations from both the sample and feature perspectives by disentangling spurious correlations and enhancing potential correlations. 1) From the sample perspective, we develop a frequency restriction module, guiding the model to focus on the relevant correlations between object features and labels, thereby disentangling spurious correlations. 2) From the feature perspective, the simple Tail Interaction module implicitly enhances potential correlations among all samples from all source domains, facilitating the acquisition of domain-invariant representations across multiple domains for the model. The experimental results show that Convolutional Neural Networks (CNNs) or Multi-Layer Perceptrons (MLPs) with a strong baseline embedded with these two modules can achieve superior results, e.g., an average accuracy of 92.30% on Digits-DG.

Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

TL;DR

This paper focuses on improving the generalization ability of the model by compelling it to acquire domain-invariant representations from both the sample and feature perspectives by disentangling spurious correlations and enhancing potential correlations.

Abstract

Domain generalization (DG) intends to train a model on multiple source domains to ensure that it can generalize well to an arbitrary unseen target domain. The acquisition of domain-invariant representations is pivotal for DG as they possess the ability to capture the inherent semantic information of the data, mitigate the influence of domain shift, and enhance the generalization capability of the model. Adopting multiple perspectives, such as the sample and the feature, proves to be effective. The sample perspective facilitates data augmentation through data manipulation techniques, whereas the feature perspective enables the extraction of meaningful generalization features. In this paper, we focus on improving the generalization ability of the model by compelling it to acquire domain-invariant representations from both the sample and feature perspectives by disentangling spurious correlations and enhancing potential correlations. 1) From the sample perspective, we develop a frequency restriction module, guiding the model to focus on the relevant correlations between object features and labels, thereby disentangling spurious correlations. 2) From the feature perspective, the simple Tail Interaction module implicitly enhances potential correlations among all samples from all source domains, facilitating the acquisition of domain-invariant representations across multiple domains for the model. The experimental results show that Convolutional Neural Networks (CNNs) or Multi-Layer Perceptrons (MLPs) with a strong baseline embedded with these two modules can achieve superior results, e.g., an average accuracy of 92.30% on Digits-DG.
Paper Structure (26 sections, 13 equations, 13 figures, 18 tables, 2 algorithms)

This paper contains 26 sections, 13 equations, 13 figures, 18 tables, 2 algorithms.

Figures (13)

  • Figure 1: Diagram of disentangling spurious correlations and enhancing potential correlations, aiming to learn domain-invariant representations. The baseline models focus both on objects and backgrounds, while ours focus mainly on the generalized features of the objects.
  • Figure 2: The architecture using our proposed two modules. The whole network includes two portable modules: frequency restriction (Gaussian Kernel/Two-step High-pass Filter) and Tail Interaction. The network receives both original and augmented images as input. At the end of the network, the Tail Interaction is based on two units, $I_k$ and $I_v$, to establish the potential correlations among different samples.
  • Figure 3: Diagrams of Self-attention and Tail Interaction.
  • Figure 4: Visualization comparison between FACT (Amplitude Mix), BrAD, and our augmentation (Two-step High-pass Filter).
  • Figure 5: Effects of hyper-parameters including filter severity level, scaling factor, interaction unit size, and Gaussian Kernel size. The experiments are conducted on Digits-DG with GFNet-H-Ti as the backbone.
  • ...and 8 more figures