Table of Contents
Fetching ...

DivIL: Unveiling and Addressing Over-Invariance for Out-of- Distribution Generalization

Jiaqi Wang, Yuhang Zhou, Zhixiong Zhang, Qiguang Chen, Yongqiang Chen, James Cheng

TL;DR

The paper identifies a fundamental limitation of invariant learning: over-invariance, where overly strong invariance constraints degrade important details and hurt OOD generalization. To address this, it introduces Diverse Invariant Learning (DivIL), which couples invariant penalties with unsupervised contrastive learning and a novel random masking strategy to diversify learned invariances. The authors formalize over-invariance, provide synthetic and real-data evidence, and validate DivIL across graphs, CMNIST, and natural language inference tasks with multiple backbones and data augmentations. Results show DivIL consistently improves OOD generalization over standard IL baselines, offering a practical, modality-spanning approach to robust distribution generalization. The work provides both theoretical insight and a scalable framework for enhancing invariant learning in real-world settings.

Abstract

Out-of-distribution generalization is a common problem that expects the model to perform well in the different distributions even far from the train data. A popular approach to addressing this issue is invariant learning (IL), in which the model is compiled to focus on invariant features instead of spurious features by adding strong constraints during training. However, there are some potential pitfalls of strong invariant constraints. Due to the limited number of diverse environments and over-regularization in the feature space, it may lead to a loss of important details in the invariant features while alleviating the spurious correlations, namely the over-invariance, which can also degrade the generalization performance. We theoretically define the over-invariance and observe that this issue occurs in various classic IL methods. To alleviate this issue, we propose a simple approach Diverse Invariant Learning (DivIL) by adding the unsupervised contrastive learning and the random masking mechanism compensatory for the invariant constraints, which can be applied to various IL methods. Furthermore, we conduct experiments across multiple modalities across 12 datasets and 6 classic models, verifying our over-invariance insight and the effectiveness of our DivIL framework. Our code is available at https://github.com/kokolerk/DivIL.

DivIL: Unveiling and Addressing Over-Invariance for Out-of- Distribution Generalization

TL;DR

The paper identifies a fundamental limitation of invariant learning: over-invariance, where overly strong invariance constraints degrade important details and hurt OOD generalization. To address this, it introduces Diverse Invariant Learning (DivIL), which couples invariant penalties with unsupervised contrastive learning and a novel random masking strategy to diversify learned invariances. The authors formalize over-invariance, provide synthetic and real-data evidence, and validate DivIL across graphs, CMNIST, and natural language inference tasks with multiple backbones and data augmentations. Results show DivIL consistently improves OOD generalization over standard IL baselines, offering a practical, modality-spanning approach to robust distribution generalization. The work provides both theoretical insight and a scalable framework for enhancing invariant learning in real-world settings.

Abstract

Out-of-distribution generalization is a common problem that expects the model to perform well in the different distributions even far from the train data. A popular approach to addressing this issue is invariant learning (IL), in which the model is compiled to focus on invariant features instead of spurious features by adding strong constraints during training. However, there are some potential pitfalls of strong invariant constraints. Due to the limited number of diverse environments and over-regularization in the feature space, it may lead to a loss of important details in the invariant features while alleviating the spurious correlations, namely the over-invariance, which can also degrade the generalization performance. We theoretically define the over-invariance and observe that this issue occurs in various classic IL methods. To alleviate this issue, we propose a simple approach Diverse Invariant Learning (DivIL) by adding the unsupervised contrastive learning and the random masking mechanism compensatory for the invariant constraints, which can be applied to various IL methods. Furthermore, we conduct experiments across multiple modalities across 12 datasets and 6 classic models, verifying our over-invariance insight and the effectiveness of our DivIL framework. Our code is available at https://github.com/kokolerk/DivIL.

Paper Structure

This paper contains 45 sections, 12 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: (a) shows the structural causal model of invariant and spurious features in relation to the invariance and the environments. (b) shows the over-invariance issue in the graph field. Each graph $G$ consists of the invariant subgraph $G_c$ (star, house) and the spurious subgraph $G_s$ (wheel, tree). Previous IL methods alleviate spurious subgraphs while sacrificing important details of invariant subgraphs (The circle is the invariant subgraph $\hat{G}_c$ identified by the model.), causing the over-invariance issue.
  • Figure 2: Illustrations of three structural causal models (SCMs).
  • Figure 3: The strength on different subsets of the invariant feature, lower strength means less preference, verifying the existence of over-invariance issue. The X-axis represents the logarithm of spurious variance $\sigma_s$. The Y-axis shows the strength of the corresponding subset of invariant features $\Phi(x)$ under varying invariant variances $\sigma_c = \{0.1, 1, 3, 5\}$. For each configuration, we run 10 different seeds and report the average results.
  • Figure 4: Data augmentation of $\mathcal{L}_{ucl}$ in DivIL across multi-modals. Left up: random masking the figure to 0. Left down: edge removing and node dropping for the graph. Right: We feed the same input sequence to the encoder twice by applying different dropout masks to obtain the positive pair.
  • Figure 5: The increased strengths after incorporating UCL with IRMv1 and VREx. The X-axis represents different invariant variances $\sigma_c = \{5,3,1,0.1\}$.The Y-axis shows the strength of the corresponding subset of invariant features $\Phi(x)$ before and after adding the UCL. For each configuration, we run 10 different seeds and report the average results.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition 3.1: Invariance Principle
  • Definition 3.2: Invariant feature
  • Definition 3.3: Over-Invariant feature
  • Definition 3.4: Data Generation
  • Definition 3.5: Strength
  • Remark 3.6: Over-invariance issue
  • Remark 4.1: Effectiveness of DivIL