Optimizing Multi-Scale Representations to Detect Effect Heterogeneity Using Earth Observation and Computer Vision: Applications to Two Anti-Poverty RCTs
Fucheng Warren Zhu, Connor T. Jerzak, Adel Daoud
TL;DR
This paper tackles the scale-sensitivity challenge in EO-based causal inference by introducing Multi-Scale Representation Concatenation, a composable method that converts single-scale CATE pipelines into multi-scale ones by concatenating representations from two image scales. Using CLIP-based satellite encodings and Causal Forests, the authors demonstrate via simulations and two anti-poverty RCTs (Peru and Uganda) that multi-scale representations can enhance the detection of treatment effect heterogeneity, as measured by RATE Ratio, without designing new multi-scale architectures. A grid-search-based optimization identifies effective scale pairs, with findings showing that smaller local scales combined with larger contextual scales often outperform single-scale approaches, and that increasing the number of scales boosts heterogeneity detection up to a point. The work offers a practical, interpretable framework for incorporating multi-scale information into EO-based causal inference, with potential policy implications for targeting and impact evaluation, while acknowledging limitations related to identification assumptions and privacy concerns.
Abstract
Earth Observation (EO) data are increasingly used in policy analysis by enabling granular estimation of conditional average treatment effects (CATE). However, a challenge in EO-based causal inference is determining the scale of the input satellite imagery -- balancing the trade-off between capturing fine-grained individual heterogeneity in smaller images and broader contextual information in larger ones. This paper introduces Multi-Scale Representation Concatenation, a set of composable procedures that transform arbitrary single-scale EO-based CATE estimation algorithms into multi-scale ones. We benchmark the performance of Multi-Scale Representation Concatenation on a CATE estimation pipeline that combines Vision Transformer (ViT) models (which encode images) with Causal Forests (CFs) to obtain CATE estimates from those encodings. We first perform simulation studies where the causal mechanism is known, showing that our multi-scale approach captures information relevant to effect heterogeneity that single-scale ViT models fail to capture as measured by $R^2$. We then apply the multi-scale method to two randomized controlled trials (RCTs) conducted in Peru and Uganda using Landsat satellite imagery. As we do not have access to ground truth CATEs in the RCT analysis, the Rank Average Treatment Effect Ratio (RATE Ratio) measure is employed to assess performance. Results indicate that Multi-Scale Representation Concatenation improves the performance of deep learning models in EO-based CATE estimation without the complexity of designing new multi-scale architectures for a specific use case. The application of Multi-Scale Representation Concatenation could have meaningful policy benefits -- e.g., potentially increasing the impact of poverty alleviation programs without additional resource expenditure.
