Causality-oriented robustness: exploiting general noise interventions
Xinwei Shen, Peter Bühlmann, Armeen Taeb
TL;DR
This work addresses prediction under distribution shifts by introducing Distributional Robustness via Invariant Gradients (DRIG), a causality-oriented robustness framework. DRIG regularizes ERM to align gradient behavior across environments, interpolating between in-distribution performance and causal robustness, and generalizing anchor regression to arbitrary noise interventions. The authors provide theoretical guarantees in linear SCMs, connect DRIG to causal parameters, and extend it with semi-supervised variants DRIG-A and DRIG-A+ for adapting to target distributions. They validate the approach on synthetic data, single-cell perturbation data, and ICU health records, demonstrating improved worst-case robustness and practical applicability. The results highlight a trade-off between predictive accuracy and causality, with DRIG offering a tunable, data-driven path toward robust, real-world predictions.
Abstract
Since distribution shifts are common in real-world applications, there is a pressing need to develop prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general noise interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations. We establish finite-sample results and extend our approach to semi-supervised domain adaptation to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell and intensive health care datasets.
