Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

Ying Jin; Naoki Egami; Dominik Rothenhäusler

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

Ying Jin, Naoki Egami, Dominik Rothenhäusler

TL;DR

The paper tackles generalization under distribution shift by challenging the dominance of covariate shift and showing that observable covariate shift can predict the magnitude of unobserved conditional shift. It introduces standardized, pivotal measures for covariate and conditional shifts and grounds them in a random distribution shift model, supported by two large-scale replication datasets (Pipeline and ManyLabs 1) spanning 680 studies across 65 sites. This enables construction of prediction intervals for target estimates that achieve valid coverage with substantially shorter intervals than worst-case bounds, offering a data-adaptive middle ground between IID assumptions and adversarial shifts. The approach provides practical tools for uncertainty quantification in external validity tasks and motivates data collection strategies that prioritize understanding distribution shifts rather than merely adjusting covariates.

Abstract

Many existing approaches to generalizing statistical inference amidst distribution shift operate under the covariate shift assumption, which posits that the conditional distribution of unobserved variables given observable ones is invariant across populations. However, recent empirical investigations have demonstrated that adjusting for shift in observed variables (covariate shift) is often insufficient for generalization. In other words, covariate shift does not typically ``explain away'' the distribution shift between settings. As such, addressing the unknown yet non-negligible shift in the unobserved variables given observed ones (conditional shift) is crucial for generalizable inference. In this paper, we present a series of empirical evidence from two large-scale multi-site replication studies to support a new role of covariate shift in ``predicting'' the strength of the unknown conditional shift. Analyzing 680 studies across 65 sites, we find that even though the conditional shift is non-negligible, its strength can often be bounded by that of the observable covariate shift. However, this pattern only emerges when the two sources of shifts are quantified by our proposed standardized, ``pivotal'' measures. We then interpret this phenomenon by connecting it to similar patterns that can be theoretically derived from a random distribution shift model. Finally, we demonstrate that exploiting the predictive role of covariate shift leads to reliable and efficient uncertainty quantification for target estimates in generalization tasks with partially observed data. Overall, our empirical and theoretical analyses suggest a new way to approach the problem of distributional shift, generalizability, and external validity.

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

TL;DR

Abstract

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (3)