A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty
Kazuki Sakamoto, Connor T. Jerzak, Adel Daoud
TL;DR
This paper addresses the problem of inferring causal relationships in the geography of poverty using Earth Observation data and machine learning. It surveys the nascent literature on EO-ML causal inference, identifying five core approaches: outcome imputation, EO-based deconfounding, treatment effect heterogeneity, transportability, and image-informed causal discovery, and provides a practical, three-stage protocol for integrating EO data into causal analyses. The review finds that most work to date focuses on environmental or methodological questions rather than direct poverty outcomes, and highlights challenges such as multi-resolution, multi-source data, information leakage, and spatial dependence. The proposed protocol offers concrete guidance on data choices, CV representations, and evaluation, aiming to improve causal credibility and enable finer-grained policy insights, with applicability to other SDG domains beyond poverty.
Abstract
Earth observation (EO) data such as satellite imagery can have far-reaching impacts on our understanding of the geography of poverty, especially when coupled with machine learning (ML) and computer vision. Early research used computer vision to predict living conditions in areas with limited data, but recent studies increasingly focus on causal analysis. Despite this shift, the use of EO-ML methods for causal inference lacks thorough documentation, and best practices are still developing. Through a comprehensive scoping review, we catalog the current literature on EO-ML methods in causal analysis. We synthesize five principal approaches to incorporating EO data in causal workflows: (1) outcome imputation for downstream causal analysis, (2) EO image deconfounding, (3) EO-based treatment effect heterogeneity, (4) EO-based transportability analysis, and (5) image-informed causal discovery. Building on these findings, we provide a detailed protocol guiding researchers in integrating EO data into causal analysis -- covering data requirements, computer vision model selection, and evaluation metrics. While our focus centers on health and living conditions outcomes, our protocol is adaptable to other sustainable development domains utilizing EO data.
