Table of Contents
Fetching ...

Learning domain-invariant features through channel-level sparsification for Out-Of Distribution Generalization

Haoran Pei, Yuguang Yang, Kexin Liu, Juan Zhang, Baochang Zhang

Abstract

Out-of-Distribution (OOD) generalization has become a primary metric for evaluating image analysis systems. Since deep learning models tend to capture domain-specific context, they often develop shortcut dependencies on these non-causal features, leading to inconsistent performance across different data sources. Current techniques, such as invariance learning, attempt to mitigate this. However, they struggle to isolate highly mixed features within deep latent spaces. This limitation prevents them from fully resolving the shortcut learning problem.In this paper, we propose Hierarchical Causal Dropout (HCD), a method that uses channel-level causal masks to enforce feature sparsity. This approach allows the model to separate causal features from spurious ones, effectively performing a causal intervention at the representation level. The training is guided by a Matrix-based Mutual Information (MMI) objective to minimize the mutual information between latent features and domain labels, while simultaneously maximizing the information shared with class labels.To ensure stability, we incorporate a StyleMix-driven VICReg module, which prevents the masks from accidentally filtering out essential causal data. Experimental results on OOD benchmarks show that HCD performs better than existing top-tier methods.

Learning domain-invariant features through channel-level sparsification for Out-Of Distribution Generalization

Abstract

Out-of-Distribution (OOD) generalization has become a primary metric for evaluating image analysis systems. Since deep learning models tend to capture domain-specific context, they often develop shortcut dependencies on these non-causal features, leading to inconsistent performance across different data sources. Current techniques, such as invariance learning, attempt to mitigate this. However, they struggle to isolate highly mixed features within deep latent spaces. This limitation prevents them from fully resolving the shortcut learning problem.In this paper, we propose Hierarchical Causal Dropout (HCD), a method that uses channel-level causal masks to enforce feature sparsity. This approach allows the model to separate causal features from spurious ones, effectively performing a causal intervention at the representation level. The training is guided by a Matrix-based Mutual Information (MMI) objective to minimize the mutual information between latent features and domain labels, while simultaneously maximizing the information shared with class labels.To ensure stability, we incorporate a StyleMix-driven VICReg module, which prevents the masks from accidentally filtering out essential causal data. Experimental results on OOD benchmarks show that HCD performs better than existing top-tier methods.

Paper Structure

This paper contains 13 sections, 7 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the proposed HCD framework. Black dots ($\bullet$) denote branching points.
  • Figure 2: Visual comparison of Grad-CAM on the iWildCam dataset. Columns from left to right: HCD, Bonsai, and ERM. HCD maintains precise target localization across complex scenarios such as nocturnal noise, infrared imaging, and severe vegetation occlusion , effectively ignoring domain-specific environmental backgrounds.
  • Figure 3: Visualization of Loss Landscapes. Columns from left to right: HCD, Bonsai, and ERM. HCD demonstrates a significantly flatter and more expansive basin structure, reflecting its superior stability and reduced sensitivity to the distribution shifts encountered in open-world environments.