Table of Contents
Fetching ...

A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty

Kazuki Sakamoto, Connor T. Jerzak, Adel Daoud

TL;DR

This paper addresses the problem of inferring causal relationships in the geography of poverty using Earth Observation data and machine learning. It surveys the nascent literature on EO-ML causal inference, identifying five core approaches: outcome imputation, EO-based deconfounding, treatment effect heterogeneity, transportability, and image-informed causal discovery, and provides a practical, three-stage protocol for integrating EO data into causal analyses. The review finds that most work to date focuses on environmental or methodological questions rather than direct poverty outcomes, and highlights challenges such as multi-resolution, multi-source data, information leakage, and spatial dependence. The proposed protocol offers concrete guidance on data choices, CV representations, and evaluation, aiming to improve causal credibility and enable finer-grained policy insights, with applicability to other SDG domains beyond poverty.

Abstract

Earth observation (EO) data such as satellite imagery can have far-reaching impacts on our understanding of the geography of poverty, especially when coupled with machine learning (ML) and computer vision. Early research used computer vision to predict living conditions in areas with limited data, but recent studies increasingly focus on causal analysis. Despite this shift, the use of EO-ML methods for causal inference lacks thorough documentation, and best practices are still developing. Through a comprehensive scoping review, we catalog the current literature on EO-ML methods in causal analysis. We synthesize five principal approaches to incorporating EO data in causal workflows: (1) outcome imputation for downstream causal analysis, (2) EO image deconfounding, (3) EO-based treatment effect heterogeneity, (4) EO-based transportability analysis, and (5) image-informed causal discovery. Building on these findings, we provide a detailed protocol guiding researchers in integrating EO data into causal analysis -- covering data requirements, computer vision model selection, and evaluation metrics. While our focus centers on health and living conditions outcomes, our protocol is adaptable to other sustainable development domains utilizing EO data.

A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty

TL;DR

This paper addresses the problem of inferring causal relationships in the geography of poverty using Earth Observation data and machine learning. It surveys the nascent literature on EO-ML causal inference, identifying five core approaches: outcome imputation, EO-based deconfounding, treatment effect heterogeneity, transportability, and image-informed causal discovery, and provides a practical, three-stage protocol for integrating EO data into causal analyses. The review finds that most work to date focuses on environmental or methodological questions rather than direct poverty outcomes, and highlights challenges such as multi-resolution, multi-source data, information leakage, and spatial dependence. The proposed protocol offers concrete guidance on data choices, CV representations, and evaluation, aiming to improve causal credibility and enable finer-grained policy insights, with applicability to other SDG domains beyond poverty.

Abstract

Earth observation (EO) data such as satellite imagery can have far-reaching impacts on our understanding of the geography of poverty, especially when coupled with machine learning (ML) and computer vision. Early research used computer vision to predict living conditions in areas with limited data, but recent studies increasingly focus on causal analysis. Despite this shift, the use of EO-ML methods for causal inference lacks thorough documentation, and best practices are still developing. Through a comprehensive scoping review, we catalog the current literature on EO-ML methods in causal analysis. We synthesize five principal approaches to incorporating EO data in causal workflows: (1) outcome imputation for downstream causal analysis, (2) EO image deconfounding, (3) EO-based treatment effect heterogeneity, (4) EO-based transportability analysis, and (5) image-informed causal discovery. Building on these findings, we provide a detailed protocol guiding researchers in integrating EO data into causal analysis -- covering data requirements, computer vision model selection, and evaluation metrics. While our focus centers on health and living conditions outcomes, our protocol is adaptable to other sustainable development domains utilizing EO data.
Paper Structure (13 sections, 6 equations, 6 figures, 3 tables)

This paper contains 13 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Design for scoping literature review on causal EO-ML work.
  • Figure 2: Trends in research output on the intersection of causal inference, machine learning, and Earth observation.
  • Figure 3: Summary of papers in the causal EO-ML literature. $*$ denotes a preprint (as of July 2024).
  • Figure 4: Summary of papers in the causal EO-ML literature regarding multi-resolution, multi-phase, and multi-source considerations.
  • Figure 5: Base graphs of causal EO-ML methods. Images are often seen as a proxy of other important unobserved factors, $U_i$. $\tau_i$ denotes the causal effect of $A_i$ on $Y_i$. $\mathbf{M}_i$ represents a satellite image array; $Y_i$ denotes the outcome, $A_i$ intervention of interest, and $U_i$ unobserved factors that help drive the causal system and that are indirectly captured in satellite image representations. Diamond nodes represent selection or missingness indicators. Dotted lines represent relationships of unknown or indeterminate directionality.
  • ...and 1 more figures