Out-of-distribution robustness for multivariate analysis via causal regularisation
Homer Durand, Gherardo Varando, Nathan Mankovich, Gustau Camps-Valls
TL;DR
The paper addresses the challenge of out-of-distribution generalisation in multivariate analysis by extending Anchor Regression (AR) to a broad class of multivariate algorithms. It shows that, under a linear Anchor SCM, the worst-case loss over anchor interventions reduces to a simple linear combination of training covariances, enabling anchor-compatible losses to achieve distributional robustness. The authors provide population and sample estimators, discuss parameter selection, and identify which MVAs are anchor-compatible (e.g., MLR, OPLS, RRR, PLS) versus not (e.g., CCA). Through simulations, climate D&A, and air-quality experiments, they demonstrate that anchor-regularised MVAs improve test performance and invariance under bounded anchor perturbations, while incurring modest computational overhead. The work bridges causal inference and classical MVAs, offering practical, robust tools for domain-shift-prone scientific applications and paving the way for nonlinear (kernel) extensions.
Abstract
We propose a regularisation strategy of classical machine learning algorithms rooted in causality that ensures robustness against distribution shifts. Building upon the anchor regression framework, we demonstrate how incorporating a straightforward regularisation term into the loss function of classical multivariate analysis algorithms, such as (orthonormalized) partial least squares, reduced-rank regression, and multiple linear regression, enables out-of-distribution generalisation. Our framework allows users to efficiently verify the compatibility of a loss function with the regularisation strategy. Estimators for selected algorithms are provided, showcasing consistency and efficacy in synthetic and real-world climate science problems. The empirical validation highlights the versatility of anchor regularisation, emphasizing its compatibility with multivariate analysis approaches and its role in enhancing replicability while guarding against distribution shifts. The extended anchor framework advances causal inference methodologies, addressing the need for reliable out-of-distribution generalisation.
