Program Evaluation with Remotely Sensed Outcomes
Ashesh Rambachan, Rahul Singh, Davide Viviano
TL;DR
The paper tackles bias in program evaluation when outcomes are measured with remotely sensed data (RSVs) that are post-outcome variables. It develops a nonparametric identification strategy that leverages stability in the conditional distribution of the RSV given the outcome and treatment across experimental and observational samples, yielding an identification formula that combines both data sources. A key contribution is showing that valid, efficient inference can be achieved with an RSV representation that uses predictions of three quantities—outcome, treatment, and sample indicator—without requiring rate conditions on the RSV predictions, enabling usage of complex deep learning predictors. The authors provide a practical estimation procedure with cross-fitting and bootstrap inference, along with three diagnostics, and demonstrate via semi-synthetic and real-data experiments (based on an anti-poverty program in India) that their method recovers true effects and can substantially reduce survey costs while delivering reliable inference.
Abstract
Economists often estimate treatment effects in experiments using remotely sensed variables (RSVs), e.g., satellite images or mobile phone activity, in place of directly measured economic outcomes. A common practice is to use an observational sample to train a predictor of the economic outcome from the RSV, and then use these predictions as the outcomes in the experiment. We show that this method is biased whenever the RSV is a post-outcome variable, meaning that variation in the economic outcome causes variation in the RSV. For example, changes in poverty or environmental quality cause changes in satellite images, but not vice versa. As our main result, we nonparametrically identify the treatment effect by formalizing the intuition underlying common practice: the conditional distribution of the RSV given the outcome and treatment is stable across samples. Our identifying formula reveals that efficient inference requires predictions of three quantities from the RSV -- the outcome, treatment, and sample indicator -- whereas common practice only predicts the outcome. Valid inference does not require any rate conditions on RSV predictions, justifying the use of complex deep learning algorithms with unknown statistical properties. We reanalyze the effect of an anti-poverty program in India using satellite images.
