Table of Contents
Fetching ...

Achievable distributional robustness when the robust risk is only partially identified

Julia Kostin, Nicola Gnecco, Fanny Yang

TL;DR

This work addresses distributional robustness when the robust risk is not fully identifiable, proposing the worst-case robust risk and a population minimax quantity to characterize the best achievable robustness under partial identifiability. It provides a concrete linear-SCM analysis showing how identifiability of training-shift directions governs robustness and demonstrates that existing finite robustness methods can be suboptimal when unseen shifts are present. The authors validate theory with synthetic and real-world gene-expression experiments, revealing that accounting for partial identifiability yields better generalization under distribution shifts. The results suggest a shift in how robustness is benchmarked, underscoring the importance of partially identifiable robustness and motivating extensions to nonlinear settings and active data collection strategies.

Abstract

In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on the shifts when the training distributions are heterogeneous enough to identify the robust risk. However, in practice, such identifiability conditions are rarely satisfied -- a scenario so far underexplored in the theoretical literature. In this paper, we aim to fill the gap and propose to study the more general setting when the robust risk is only partially identifiable. In particular, we introduce the worst-case robust risk as a new measure of robustness that is always well-defined regardless of identifiability. Its minimum corresponds to an algorithm-independent (population) minimax quantity that measures the best achievable robustness under partial identifiability. While these concepts can be defined more broadly, in this paper we introduce and derive them explicitly for a linear model for concreteness of the presentation. First, we show that existing robustness methods are provably suboptimal in the partially identifiable case. We then evaluate these methods and the minimizer of the (empirical) worst-case robust risk on real-world gene expression data and find a similar trend: the test error of existing robustness methods grows increasingly suboptimal as the fraction of data from unseen environments increases, whereas accounting for partial identifiability allows for better generalization.

Achievable distributional robustness when the robust risk is only partially identified

TL;DR

This work addresses distributional robustness when the robust risk is not fully identifiable, proposing the worst-case robust risk and a population minimax quantity to characterize the best achievable robustness under partial identifiability. It provides a concrete linear-SCM analysis showing how identifiability of training-shift directions governs robustness and demonstrates that existing finite robustness methods can be suboptimal when unseen shifts are present. The authors validate theory with synthetic and real-world gene-expression experiments, revealing that accounting for partial identifiability yields better generalization under distribution shifts. The results suggest a shift in how robustness is benchmarked, underscoring the importance of partially identifiable robustness and motivating extensions to nonlinear settings and active data collection strategies.

Abstract

In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on the shifts when the training distributions are heterogeneous enough to identify the robust risk. However, in practice, such identifiability conditions are rarely satisfied -- a scenario so far underexplored in the theoretical literature. In this paper, we aim to fill the gap and propose to study the more general setting when the robust risk is only partially identifiable. In particular, we introduce the worst-case robust risk as a new measure of robustness that is always well-defined regardless of identifiability. Its minimum corresponds to an algorithm-independent (population) minimax quantity that measures the best achievable robustness under partial identifiability. While these concepts can be defined more broadly, in this paper we introduce and derive them explicitly for a linear model for concreteness of the presentation. First, we show that existing robustness methods are provably suboptimal in the partially identifiable case. We then evaluate these methods and the minimizer of the (empirical) worst-case robust risk on real-world gene expression data and find a similar trend: the test error of existing robustness methods grows increasingly suboptimal as the fraction of data from unseen environments increases, whereas accounting for partial identifiability allows for better generalization.

Paper Structure

This paper contains 41 sections, 9 theorems, 97 equations, 5 figures, 1 table, 1 algorithm.

Key Result

proposition 1

Suppose that the set of training and test distributions is generated according to sec:training-data and Assumption as:Mtest-structure holds. Then, where $\mathcal{B}^{rob}_{\Theta_{\mathrm{eq}}} = \{ \beta^{rob}_{\theta}: \theta \in \Theta_{\mathrm{eq}} \}$.

Figures (5)

  • Figure 1: (Left) SCM with hidden confounding and (right) induced graph. The model allows for an arbitrary causal structure of the observed variables $(X,Y)$, as long as $\mathbf{I} - \mathbf{B}$ is invertible, e.g. when the underlying graph is acyclic. The shifts across different distributions are captured via shift interventions on $X$. However, the model does not allow for interventions on the target variable $Y$ or hidden confounders $H$.
  • Figure 2: Relationship between identifiability of the model parameters and identifiability of the robust risk. (a) The classical scenario where the test shift upper bound $M_{\mathrm{test}} = M_{\text{seen}}$ is contained in the span of training shifts so that the robust risk is point-identified. (b) The more general scenario of this paper, where $M_{\mathrm{test}}=M_{\text{unseen}}$ contains new shift directions and where only a set can be identified in which the true robust risk lies.
  • Figure 3: Worst-case robust risk of the baseline estimators $\beta_{\mathrm{OLS}}, \beta_{\text{anchor}}$ (using the "correct" $\gamma$), the worst-case robust predictor in (mean-shifted) multi-environment finite-sample experiments and theoretical population lower bound in the classical identified setting with varying shift strength $\gamma$ (left) and the partially identifiable setting with fixed $\gamma$ but varying $\gamma'$ (right). The details of the experimental setting can be found in \ref{['sec:apx-experiments']}.
  • Figure 4: The figures show the performance of the worst-case robust predictor (Worst-case Rob.) compared to other methods as a function of perturbation strength $s$. Different panels correspond to the proportion of unseen shift directions at test time. For each panel and perturbation strength $s$, each point represents an average over the 28 target genes and three experiments (i.e., training environments).
  • Figure 5: The figures illustrate the structure of the (a) training-time shifts and (b-c) test-time shifts for different perturbation strengths on the example of two covariates. Panel (a) shows the training data containing two environments--observational (blue) and shifted (orange) corresponding to the knockout of the gene ENSG00000089009. Panels (b) and (c) show the training data in grey and test data from a previously unseen environment (green). Panel (b) depicts the top $10\%$ test data points closest to the training support (perturbation strength = $0.1$). Panel (c) illustrates the full test data (perturbation strength = 1.0).

Theorems & Definitions (13)

  • example 1
  • definition 1: Observational equivalence
  • definition 2: Worst-case robust risk and the minimax quantity
  • proposition 1: Identifiability of model parameters and robust predictor
  • theorem 3.1
  • corollary 3.2: Worst-case robust risk of the anchor regression estimator
  • proposition 2: Identifiability of reference distribution parameters and robust prediction model
  • lemma C.1
  • proof
  • proposition 3
  • ...and 3 more