Table of Contents
Fetching ...

Anchor regression: heterogeneous data meets causality

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, Jonas Peters

TL;DR

Anchor regression addresses predictive generalization under distributional shifts by leveraging exogenous anchors to regularize the least-squares loss. It forms a continuous path between partialling out, ordinary least squares, and two-stage least squares, with a rigorous minimax interpretation: the penalized criterion equals the worst-case risk over a class of shift interventions. The approach yields distributional robustness and improved replicability of variable selection, even when instrumental-variable assumptions fail, and it provides finite-sample bounds in high dimensions. Empirical results on GTEx and bike-sharing data illustrate enhanced stability and predictive reliability across heterogeneous domains, supporting the method's practical utility for robust inference under structured perturbations. The work also outlines practical guidance for anchor choice and parameter tuning, and discusses extensions to nonlinear models and other perturbation types.

Abstract

We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogeneous variables to solve a relaxation of the causal minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variables assumptions are violated. If anchor regression and least squares provide the same answer (anchor stability), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

Anchor regression: heterogeneous data meets causality

TL;DR

Anchor regression addresses predictive generalization under distributional shifts by leveraging exogenous anchors to regularize the least-squares loss. It forms a continuous path between partialling out, ordinary least squares, and two-stage least squares, with a rigorous minimax interpretation: the penalized criterion equals the worst-case risk over a class of shift interventions. The approach yields distributional robustness and improved replicability of variable selection, even when instrumental-variable assumptions fail, and it provides finite-sample bounds in high dimensions. Empirical results on GTEx and bike-sharing data illustrate enhanced stability and predictive reliability across heterogeneous domains, supporting the method's practical utility for robust inference under structured perturbations. The work also outlines practical guidance for anchor choice and parameter tuning, and discusses extensions to nonlinear models and other perturbation types.

Abstract

We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogeneous variables to solve a relaxation of the causal minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variables assumptions are violated. If anchor regression and least squares provide the same answer (anchor stability), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

Paper Structure

This paper contains 58 sections, 16 theorems, 153 equations, 14 figures.

Key Result

Theorem 1

Let the assumptions of Section sec:setting-notation hold. For any $b \in \mathbb{R}^{d}$ we have where and $\mathbf{M}$ is the shift matrix, cf. equation eq:32. A formulation of the result where $v$ is allowed to be random can be found in the Appendix, Section sec:theor-refthm:-regr.

Figures (14)

  • Figure 1: IV, OLS, PA and anchor regression coefficients are computed on unshifted data. The plot shows the MSE $\mathbb{E}_{v}[(Y-X^\intercal b)^{2}]$ on shifted variables for varying coefficients $b=b^{\gamma}$, $\gamma \in (0,\infty)$. The SEM for both shifted and unshifted data is given in Example \ref{['ex:2']}. The optimal coefficient lies between IV and OLS.
  • Figure 2: Predictive performance of the direct causal effect (IV), AP, OLS and anchor regression with $\gamma=5$ under varying interventions on $X$. The SEM is taken from Example \ref{['ex:2']}. The MSE $\mathbb{E}_{v}[(Y - X^\intercal b)^{2}]$ is depicted under perturbation strength $v = (t,0,0)^{\intercal}$. The causal parameter (IV) exhibits constant predictive performance under arbitrary perturbation strength $|t|$, but predictive performance under small perturbations is subpar. PA and OLS have very good performance under small interventions but performance suffers under larger interventions. Anchor regression with $\gamma=5$ trades performance on unperturbed data ($t=0$) for more stability, i.e., better performance on medium-sized interventions. In particular, it is minimax optimal under shifts $C^{5} = \{(t,0,0)^{\intercal} : |t| \le \sqrt{5} \approx 2.24 \}$, cf. Theorem \ref{['thm:anchor-regression']}. For large shifts $|t|$ the IV method eventually outperforms anchor regression. Note that all shown solutions are anchor solutions, under respective penalties $\gamma=0$ (PA), $\gamma=1$ (OLS), $\gamma=5$ and $\gamma=\infty$ (IV).
  • Figure 3: Predictive performance of the direct causal effect, PA, OLS and anchor regression under varying interventions on $H$. The MSE $\mathbb{E}_{v}[(Y-X^\intercal b)^{2}]$ is depicted under varying perturbations $v=(0,0,t)^{\intercal}$. The corresponding structural equation models are given in equation \ref{['eq:26']}. For small perturbations, PA and OLS perform better than anchor regression. The direct causal effect exhibits large MSE for all values of $t$. While the direct causal effect shows stable predictive performance under interventions on $X$ (as discussed in Section \ref{['sec:trad-perf-pert']}), this is at the expense of predictive stability under interventions on $H$ or $Y$. The MSE of anchor regression with $\gamma=5$ slowly grows in $|t|$.
  • Figure 4: Replicability of variable selection in GTEx data. Plotting how many of the $K \in \{1,\ldots ,20\}$ top-ranked features found by anchor regression and Lasso on one tissue $t$ are also one of the $K$ top-ranked features on another tissue $t'$. The results are summed over all other tissues $t' \neq t$, averaged over all tissues $t$ and averaged over 200 random choices of $y$, and they are plotted as $y$-coordinates. For anchor regression the ranking is according to \ref{['eq:anchor-ranking']}, and for Lasso according to \ref{['eq:lasso-ranking']}. The legend describes the method used on one tissue $t$ and the method used on another tissue $t'$. Anchor regression exhibits the highest degree of replicability.
  • Figure 5: Daily average squared residuals $\hat{\mathbb{E}}_{\text{test}}[(Y-X^{\intercal} \hat{b}^{\gamma})^{2}|A]$ as a function of $\gamma$. Each line corresponds to a quantile of $\hat{\mathbb{E}}_{\text{test}}[(Y-X^{\intercal} \hat{b}^{\gamma})^{2}|A]$. The quantiles are chosen in the set $\{0.05,0.01,\ldots,0.995\}$, with the median marked in red. For growing $\gamma$, the upper percentiles of $\hat{\mathbb{E}}_{\text{test}}[(Y-X^{\intercal} \hat{b}^{\gamma})^{2}|A]$ are decreasing while the lower percentiles are slightly increasing. This is in line with the theory presented in Section \ref{['sec:optim-pred-perf']}. The distribution of bike rentals is expected to change from day to day. For growing $\gamma$, the upper percentiles of the loss are reduced, i.e., predictions are increasingly reliable across days. A comparison to OLS with $\gamma = 1$ is given in the right panel of Figure \ref{['fig:optgamma']}.
  • ...and 9 more figures

Theorems & Definitions (32)

  • Example 1: Three examples of graphs $G$ which are in our model class
  • Example 2
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Theorem 2: Replicability of $b^{\rightarrow \infty}$
  • Proposition 1
  • Theorem 3: Anchor stability, predictive stability and replicability
  • Theorem 4: Anchor stability implies causality
  • Theorem 5
  • ...and 22 more