Table of Contents
Fetching ...

Linear unit-tests for invariance discovery

Benjamin Aubin, Agnieszka Słowik, Martin Arjovsky, Leon Bottou, David Lopez-Paz

TL;DR

The paper tackles out-of-distribution generalization by introducing six linear, low-dimensional unit tests that separate invariant causal signals from spurious correlations. It benchmarks several learning strategies, including ERM, IRM, IGA, AND-mask, and an Oracle, finding that most current causal-learning methods fail to match Oracle performance across tests, with IRMv1 and ANDMask offering limited success on specific problems. By releasing replication code, the authors provide a transparent, standardized platform to evaluate and stress-test invariance-based approaches. The work highlights the gap between theoretical invariances and practical OOD robustness, motivating the development of more reliable methods and benchmarks.

Abstract

There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a precise manner. Following initial experiments, none of the three recently proposed alternatives passes all tests. By providing the code to automatically replicate all the results in this manuscript (https://www.github.com/facebookresearch/InvarianceUnitTests), we hope that our unit tests become a standard steppingstone for researchers in out-of-distribution generalization.

Linear unit-tests for invariance discovery

TL;DR

The paper tackles out-of-distribution generalization by introducing six linear, low-dimensional unit tests that separate invariant causal signals from spurious correlations. It benchmarks several learning strategies, including ERM, IRM, IGA, AND-mask, and an Oracle, finding that most current causal-learning methods fail to match Oracle performance across tests, with IRMv1 and ANDMask offering limited success on specific problems. By releasing replication code, the authors provide a transparent, standardized platform to evaluate and stress-test invariance-based approaches. The work highlights the gap between theoretical invariances and practical OOD robustness, motivating the development of more reliable methods and benchmarks.

Abstract

There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a precise manner. Following initial experiments, none of the three recently proposed alternatives passes all tests. By providing the code to automatically replicate all the results in this manuscript (https://www.github.com/facebookresearch/InvarianceUnitTests), we hope that our unit tests become a standard steppingstone for researchers in out-of-distribution generalization.

Paper Structure

This paper contains 14 sections, 8 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Test error averaged across environments (E0, E1, E2) for $(d_\text{inv}, d_\text{spu}, n_\text{env})=(5,5,3)$.
  • Figure 2: Test error averaged across environments for ANDMask, ERM, IGA, IRMv1 and Oracle on the unit-tests as (top) a function of the ratio $\delta_\text{env} = {n_\text{env}} / {d_\text{spu}}$ at fixed dimensions $(d_\text{inv}, d_\text{spu}) = (5, 5)$; and as (bottom) a function of $\delta_\text{spu} = {d_\text{spu}} /{d_\text{inv}}$ for $(d_\text{inv}, n_\text{env}) = (5, 3)$.