Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks
Martin Surner, Abdelmajid Khelil, Ludwig Bothmann
TL;DR
This work tackles the challenge of out-of-distribution generalization arising from spurious correlations. It introduces Invariance Pair-Guided Learning (IPG), which uses carefully constructed invariance pairs to define an invariance condition and a corrective gradient that adaptively modulates the usual gradient descent updates, promoting invariant representations. The method is evaluated on ColoredMNIST, Waterbird-100, and CelebA, showing competitive or superior robustness to distribution shifts and revealing more invariant latent representations compared to ERM, with notable gains on worst-group accuracy in several settings. The approach remains data-efficient, does not require bias-conflicting groups, and can be extended with adversarial augmentation (IPG-AA); it offers a flexible framework for enforcing invariances and improving real-world robustness, while highlighting limitations related to pair quality and computational overhead.
Abstract
Out-of-distribution generalization of machine learning models remains challenging since the models are inherently bound to the training data distribution. This especially manifests, when the learned models rely on spurious correlations. Most of the existing approaches apply data manipulation, representation learning, or learning strategies to achieve generalizable models. Unfortunately, these approaches usually require multiple training domains, group labels, specialized augmentation, or pre-processing to reach generalizable models. We propose a novel approach that addresses these limitations by providing a technique to guide the neural network through the training phase. We first establish input pairs, representing the spurious attribute and describing the invariance, a characteristic that should not affect the outcome of the model. Based on these pairs, we form a corrective gradient complementing the traditional gradient descent approach. We further make this correction mechanism adaptive based on a predefined invariance condition. Experiments on ColoredMNIST, Waterbird-100, and CelebA datasets demonstrate the effectiveness of our approach and the robustness to group shifts.
