Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Jiaxin Qi; Kaihua Tang; Qianru Sun; Xian-Sheng Hua; Hanwang Zhang

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

TL;DR

The paper tackles Out-Of-Distribution generalization under context imbalance by arguing that context is also invariant to class, enabling a shift away from relying on context annotations. It introduces IRMCon, which learns a context representation through an intra-class contrastive loss $L_{ct}$ within an IRM framework, yielding a context extractor $\phi_t$ that aligns with $\mathbf{x}_t$; this context is then used in an IPW reweighting scheme to form a context-balanced classifier (IRMCon-IPW). Empirical evaluation across context-biased datasets (Colored MNIST, Corrupted CIFAR-10, BAR) and domain-gap datasets (PACS) demonstrates state-of-the-art OOD performance on context bias and competitive results on domain generalization, with a non-pretraining protocol to avoid leakage. Theoretical justification is provided in the appendix, and the work emphasizes practical deployment without requiring context labels.

Abstract

Out-Of-Distribution generalization (OOD) is all about learning invariance against environmental changes. If the context in every class is evenly distributed, OOD would be trivial because the context can be easily removed due to an underlying principle: class is invariant to context. However, collecting such a balanced dataset is impractical. Learning on imbalanced data makes the model bias to context and thus hurts OOD. Therefore, the key to OOD is context balance. We argue that the widely adopted assumption in prior work, the context bias can be directly annotated or estimated from biased class prediction, renders the context incomplete or even incorrect. In contrast, we point out the everoverlooked other side of the above principle: context is also invariant to class, which motivates us to consider the classes (which are already labeled) as the varying environments to resolve context bias (without context labels). We implement this idea by minimizing the contrastive loss of intra-class sample similarity while assuring this similarity to be invariant across all classes. On benchmarks with various context biases and domain gaps, we show that a simple re-weighting based classifier equipped with our context estimation achieves state-of-the-art performance. We provide the theoretical justifications in Appendix and codes on https://github.com/simpleshinobu/IRMCon.

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

TL;DR

within an IRM framework, yielding a context extractor

that aligns with

; this context is then used in an IPW reweighting scheme to form a context-balanced classifier (IRMCon-IPW). Empirical evaluation across context-biased datasets (Colored MNIST, Corrupted CIFAR-10, BAR) and domain-gap datasets (PACS) demonstrates state-of-the-art OOD performance on context bias and competitive results on domain generalization, with a non-pretraining protocol to avoid leakage. Theoretical justification is provided in the appendix, and the work emphasizes practical deployment without requiring context labels.

Abstract

Paper Structure (11 sections, 8 equations, 8 figures, 2 tables)

This paper contains 11 sections, 8 equations, 8 figures, 2 tables.

Introduction
Related Work
Common Pipeline: Invariance as Class
Empirical Risk Minimization (ERM)
Invariant Risk Minimization (IRM)
Inverse Probability Weighting (IPW)
Our Approach: Invariance as Context
Experiments
Datasets and Settings
Results and Analyses
Conclusions

Figures (8)

Figure 1: GradCAM selvaraju2017grad visualizations of learned class and context. In (a) and (b): By using ERM, if the context is diverse and balanced within a class, the class feature is accurate---focused on the human's action; if the context dominates in the data, the class feature contains the context feature, e.g., the background "grass". In (c): The conventional context estimation lff based on Principle 1 is biased to class (focusing on the class of human action "throwing"), while our IRMCon based on Principle 2 estimates better context (focusing on the background).
Figure 2: Illustrations of the related approaches irmcarlucci2019domainfeataugli2018deeplffvolpi2019addressingzhang2021deep. ERM is the baseline. Others and ours aim for mitigating context bias. The components are elaborated below. 1) The length of a context bar indicates the number of samples in that context---longer bar means the context is more prevailing. 2) A sole bar with the mixture of a color and a class number denotes the feature biased to the prevailing context. Our implementation method IRMCon-IPW is based on IRM and IPW, and our technical contribution (over the conventional methods of IRM or IPW) is the approach of disentangling context features not by using but by eliminating class features. We provide a theoretical justification in Section \ref{['sec:4']} and an empirical evaluation in Section \ref{['sec:5.2']}.
Figure 3: The training pipeline of our IRMCon-IPW. 1) "split env." denotes we split the training samples in mini-batch into subsets based on class labels, i.e., samples of each class in one subset, forming $N$ environments $\{e_i\}_1^N$; 2) $\theta$ is a dummy classifier, whose gradient is for regularizing $\phi_t$ become invariant to classes. See the detailed algorithm in Appendix
Figure 4: t-SNE van2008visualizing visualizations of our context features of the Colored MNIST test samples. The color of points denotes their class labels. IRMCon is trained on the 99% biased training set. Features are naturally clustered by context. As there is no context ground-truth, the context labels are interpreted by us.
Figure 5: Illustrations of the reweighted sample frequencies for 10 color contexts. All models are trained on the 99.5% biased Colored MNIST. The reweighted frequency of a context indicates the normalized sum over the inverse probabilities of the samples in this context. Top: Biased context distribution in the training set. Middle: Biased context distribution derived by using LfF lff. Bottom: Relatively balanced context distribution by using our method.
...and 3 more figures

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

TL;DR

Abstract

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)