Invariant Probabilistic Prediction

Alexander Henzi; Xinwei Shen; Michael Law; Peter Bühlmann

Invariant Probabilistic Prediction

Alexander Henzi, Xinwei Shen, Michael Law, Peter Bühlmann

TL;DR

It is shown that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions, in contrast to the setting of point prediction, and a method to yield invariant probabilistic predictions is proposed.

Abstract

In recent years, there has been a growing interest in statistical methods that exhibit robust performance under distribution changes between training and test data. While most of the related research focuses on point predictions with the squared error loss, this article turns the focus towards probabilistic predictions, which aim to comprehensively quantify the uncertainty of an outcome variable given covariates. Within a causality-inspired framework, we investigate the invariance and robustness of probabilistic predictions with respect to proper scoring rules. We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions, in contrast to the setting of point prediction. We illustrate how to choose evaluation metrics and restrict the class of distribution shifts to allow for identifiability and invariance in the prototypical Gaussian heteroscedastic linear model. Motivated by these findings, we propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters. Finally, we demonstrate the empirical performance of our proposed procedure on simulated as well as on single-cell data.

Invariant Probabilistic Prediction

TL;DR

Abstract

Paper Structure (28 sections, 11 theorems, 67 equations, 17 figures, 2 tables)

This paper contains 28 sections, 11 theorems, 67 equations, 17 figures, 2 tables.

Introduction
Background and setup
Model for observational distribution
Model for interventional distributions and heterogeneous data
Proper scoring rules
An illustrative example
Invariance and robustness in location-scale models
Impossibility of invariance under arbitrary interventions
Restricted interventions
Prediction intervals
Invariant probabilistic prediction
Empirical results
Simulations
Application on single-cell data
Discussion
...and 13 more sections

Key Result

Proposition 1

Consider the additive noise model as a special case of the model in eq:location_scale. Let $T$ be a functional for which $T\{\mathcal{L}(\varepsilon_Y)\} = 0$, and let $L(t,Y)$ be a strictly consistent scoring function for $T$ that depends on $t$ and $Y$ only through $Y-t$ and for which $\mathbb{E}\{L(0, \varepsilon_Y)\}$ exists. Le

Figures (17)

Figure 1: Graphical structure of an example of models \ref{['eq:model']} and \ref{['eq:perturbation']}. There is a hidden confounding variable $H$ and the hammers indicate that perturbations or interventions happen at $X = (X_1,X_2)$ as in model \ref{['eq:perturbation']}.
Figure 2: Scores for the example from Section \ref{['sec:restricted_interventions']}, with $\alpha = 2$ for PseudoS. Solid lines are for $X^t = \Gamma^t\varepsilon_X$, dashed lines for the do-interventions, in orange color for $P^o_{y|x}$ and gray for $\pi^*_{y|x}$.
Figure 3: Simulation study: Estimation error for $\beta$ (top) and $\gamma$ (bottom) as a function of penalty parameter, separated into squared bias (blue, dot-dashed), variance (orange, dashed), and total error (gray, solid), for different $n^e$. The sample sizes are given on top of each panel. The bars show height-adjusted frequencies with which $\lambda$ is chosen by the rule described in Section \ref{['sec:ipp']} with $\alpha = 0.05$; $\lambda$ is set to the maximal value $15$ for this experiment if the hypothesis of equal risk is rejected for all $\lambda$.
Figure 4: Simuation study: Mean logarithmic score on new test environment arising from the high and low variance interventions (light blue, orange), mean shift (dark blue), intervention on correlation (green), and without interventions compared to the observational distribution (gray). The histograms are height-adjusted frequencies of the chosen penalty parameter, as in Figure \ref{['fig:parameter_error']}.
Figure 5: Single cell data: Training risks of IPP over different environments (left panel); the dot is the mean risk, the inner whiskers the $0.25$- and $0.75$-quantile of the scores, and the outer whiskers the $0.05$- and $0.95$-quantile. The first environment is the observational environment, and the panel columns are different penalty parameter values. The right panel shows the p-value of the oneway.test for equal training risks as a function of the penalty, in gray for LogS and orange for SCRPS Horizontal lines indicate the levels of $0.05$ and $0.1$.
...and 12 more figures

Theorems & Definitions (23)

Definition 1: Do-interventional distribution
Proposition 1
Definition 2: Invariance
Definition 3: Robustness
Proposition 2
Theorem 3
Lemma 4
Lemma 5: Exponential scale parametrization
Example 1
Proposition 6
...and 13 more

Invariant Probabilistic Prediction

TL;DR

Abstract

Invariant Probabilistic Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (23)