Invariant Risk Minimization
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz
TL;DR
IRM tackles the challenge of distribution shifts by requiring the learned data representation to support an invariant top classifier across multiple environments, thereby promoting out-of-distribution generalization. It formalizes this idea through a penalized objective that enforces invariance via a gradient-based penalty, and provides a concrete, implementable objective that extends to general losses and multivariate outputs. The paper connects invariance to causality, showing that under reasonable diversity conditions invariances align with using direct causal parents of the target, which explains improved OOD performance. Empirically, IRM outperforms ERM and prior methods on synthetic data and Colored MNIST, illustrating stronger generalization under distribution shifts and yielding more robust, causally-faithful predictors.
Abstract
We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.
