Table of Contents
Fetching ...

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

Ying Jin, Naoki Egami, Dominik Rothenhäusler

TL;DR

The paper tackles generalization under distribution shift by challenging the dominance of covariate shift and showing that observable covariate shift can predict the magnitude of unobserved conditional shift. It introduces standardized, pivotal measures for covariate and conditional shifts and grounds them in a random distribution shift model, supported by two large-scale replication datasets (Pipeline and ManyLabs 1) spanning 680 studies across 65 sites. This enables construction of prediction intervals for target estimates that achieve valid coverage with substantially shorter intervals than worst-case bounds, offering a data-adaptive middle ground between IID assumptions and adversarial shifts. The approach provides practical tools for uncertainty quantification in external validity tasks and motivates data collection strategies that prioritize understanding distribution shifts rather than merely adjusting covariates.

Abstract

Many existing approaches to generalizing statistical inference amidst distribution shift operate under the covariate shift assumption, which posits that the conditional distribution of unobserved variables given observable ones is invariant across populations. However, recent empirical investigations have demonstrated that adjusting for shift in observed variables (covariate shift) is often insufficient for generalization. In other words, covariate shift does not typically ``explain away'' the distribution shift between settings. As such, addressing the unknown yet non-negligible shift in the unobserved variables given observed ones (conditional shift) is crucial for generalizable inference. In this paper, we present a series of empirical evidence from two large-scale multi-site replication studies to support a new role of covariate shift in ``predicting'' the strength of the unknown conditional shift. Analyzing 680 studies across 65 sites, we find that even though the conditional shift is non-negligible, its strength can often be bounded by that of the observable covariate shift. However, this pattern only emerges when the two sources of shifts are quantified by our proposed standardized, ``pivotal'' measures. We then interpret this phenomenon by connecting it to similar patterns that can be theoretically derived from a random distribution shift model. Finally, we demonstrate that exploiting the predictive role of covariate shift leads to reliable and efficient uncertainty quantification for target estimates in generalization tasks with partially observed data. Overall, our empirical and theoretical analyses suggest a new way to approach the problem of distributional shift, generalizability, and external validity.

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

TL;DR

The paper tackles generalization under distribution shift by challenging the dominance of covariate shift and showing that observable covariate shift can predict the magnitude of unobserved conditional shift. It introduces standardized, pivotal measures for covariate and conditional shifts and grounds them in a random distribution shift model, supported by two large-scale replication datasets (Pipeline and ManyLabs 1) spanning 680 studies across 65 sites. This enables construction of prediction intervals for target estimates that achieve valid coverage with substantially shorter intervals than worst-case bounds, offering a data-adaptive middle ground between IID assumptions and adversarial shifts. The approach provides practical tools for uncertainty quantification in external validity tasks and motivates data collection strategies that prioritize understanding distribution shifts rather than merely adjusting covariates.

Abstract

Many existing approaches to generalizing statistical inference amidst distribution shift operate under the covariate shift assumption, which posits that the conditional distribution of unobserved variables given observable ones is invariant across populations. However, recent empirical investigations have demonstrated that adjusting for shift in observed variables (covariate shift) is often insufficient for generalization. In other words, covariate shift does not typically ``explain away'' the distribution shift between settings. As such, addressing the unknown yet non-negligible shift in the unobserved variables given observed ones (conditional shift) is crucial for generalizable inference. In this paper, we present a series of empirical evidence from two large-scale multi-site replication studies to support a new role of covariate shift in ``predicting'' the strength of the unknown conditional shift. Analyzing 680 studies across 65 sites, we find that even though the conditional shift is non-negligible, its strength can often be bounded by that of the observable covariate shift. However, this pattern only emerges when the two sources of shifts are quantified by our proposed standardized, ``pivotal'' measures. We then interpret this phenomenon by connecting it to similar patterns that can be theoretically derived from a random distribution shift model. Finally, we demonstrate that exploiting the predictive role of covariate shift leads to reliable and efficient uncertainty quantification for target estimates in generalization tasks with partially observed data. Overall, our empirical and theoretical analyses suggest a new way to approach the problem of distributional shift, generalizability, and external validity.

Paper Structure

This paper contains 67 sections, 2 theorems, 92 equations, 21 figures, 5 tables.

Key Result

Theorem 3.3

Let $\widehat{\mathbb{E}}_Q[\psi]$ denote the sample mean of a function $\psi(T,X,U)$ over $n_Q$ i.i.d. draws from $Q$ and $\widehat{\mathbb{E}}_P[\psi]$ denote the sample mean of $\psi$ over $n_P$ i.i.d. draws from $P$. Under the random distributional shift model described above, for any function $ where $s_n^2 = \left( \frac{1}{n_P} + \frac{1}{n_Q} \right) \text{Var}_{P}(\psi) + \delta_\text{M

Figures (21)

  • Figure 1: Overview of the problem and our approach:Effect generalization from source and target populations needs to address the distribution shift consisting of the observed covariate shift and unobserved conditional shift. We argue a novel predictive role of covariate shift in bounding the strength of unknown conditional shift, which is supported by our empirical findings and leads to reliable and efficient generalization.
  • Figure 2: Preview of results. [Left] Insufficient explanatory role of covariate shift:Empirical coverage of prediction intervals based on i.i.d. assumption (grey) and covariate shift assumption (green and purple), showing covariate shift cannot explain away distribution shift across sites.[Right] Reliable and efficient effect generalization based on the predictive role of covariate shift:Empirical coverage of prediction intervals based on i.i.d. assumption (grey), worst-case bounds (dark blue), and our method with the belief that conditional shift is bounded by covariate shift (red) or with knowledge of their relative strength (yellow).
  • Figure 3: Insufficient explanatory role of covariate shift. [Left]: Under-coverage of $95\%$ prediction intervals based on the i.i.d. assumption (grey) and covariate shift assumption adjusted via doubly robust estimator (green) and entropy balancing (purple), averaged over all pairs of sites within each hypothesis for the Pipeline project (P, a) and the ManyLabs 1 data (M, a), respectively. The red dashed line is the nominal level.[Right]: Estimates based on existing approaches (via doubly robust estimator (green) and entropy balancing (purple)) do not bring the source estimates (grey) closer to the target estimate (red dashed line). As illustrative examples, we show results when generalizing from all other sites to site 5 (raw ID) in hypothesis 5 in the Pipeline data (P, b) and when generalizing from all other sites to site 4 in hypothesis 4 in ManyLabs 1 data (M, b). The segments connect estimates for the same pairs of sites.
  • Figure 4: Our covariate shift measures bound conditional shift measures in various contexts (pivotality). [Left]: Conditional and covariate shift measures for site pairs between US and Europe/Non-US and site pairs within US in the Pipeline data (P, a) and the ManyLabs 1 data (M, a). [Right]: Conditional and covariate shift measures for all site pairs in hypotheses 5 and 6 in the Pipeline data (P, b), and those in hypotheses 3 and 4 in the ManyLabs 1 data (M, b). A few ($\leq 5$) largest values are removed for visualization.[Bottom]: Empirical quantiles of the ratios between conditional and covariate shift measures within each hypothesis in the Pipeline and ManyLabs1 datasets (grey and brown curves). The red curves are multiples of the quantiles of standard normal distribution plotted for reference.
  • Figure 5: Visualization of the random distribution shift model. The original distribution is randomly perturbed to produce the distribution from which data are i.i.d. drawn. Our model assumes independent perturbation/reweighting of equal-probability small events and takes the number of small events to infinity.
  • ...and 16 more figures

Theorems & Definitions (3)

  • Remark 3.2
  • Theorem 3.3: Distributional CLT
  • Theorem E.1