KidSat: satellite imagery to map childhood poverty dataset and benchmark

Makkunda Sharma; Fan Yang; Duy-Nhat Vo; Esra Suel; Swapnil Mishra; Samir Bhatt; Oliver Fiala; William Rudgard; Seth Flaxman

KidSat: satellite imagery to map childhood poverty dataset and benchmark

Makkunda Sharma, Fan Yang, Duy-Nhat Vo, Esra Suel, Swapnil Mishra, Samir Bhatt, Oliver Fiala, William Rudgard, Seth Flaxman

TL;DR

The paper introduces KidSat, a dataset linking high-resolution satellite imagery with DHS-derived ground truth on multidimensional child poverty across 19 countries in Eastern and Southern Africa from 1997 to 2022. It benchmarks multiple models, including MOSAIKS, DINOv2, and SatMAE, on spatial and temporal generalization tasks, and provides open-source code for dataset construction and evaluation. Findings show that foundation models, especially when fine-tuned with DHS variables, improve spatial poverty prediction over baselines, while temporal forecasting remains more challenging due to distribution shifts. This work offers a scalable resource for fine-grained poverty mapping and policy analysis, highlighting practical trade-offs between imagery resolution, model type, and compute resources.

Abstract

Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representations. Our dataset consists of 33,608 images, each 10 km $\times$ 10 km, from 19 countries in Eastern and Southern Africa in the time period 1997-2022. As defined by UNICEF, multidimensional child poverty covers six dimensions and it can be calculated from the face-to-face Demographic and Health Surveys (DHS) Program . As part of the benchmark, we test spatial as well as temporal generalization, by testing on unseen locations, and on data after the training years. Using our dataset we benchmark multiple models, from low-level satellite imagery models such as MOSAIKS , to deep learning foundation models, which include both generic vision models such as Self-Distillation with no Labels (DINOv2) models and specific satellite imagery models such as SatMAE. We provide open source code for building the satellite dataset, obtaining ground truth data from DHS and running various models assessed in our work.

KidSat: satellite imagery to map childhood poverty dataset and benchmark

TL;DR

Abstract

10 km, from 19 countries in Eastern and Southern Africa in the time period 1997-2022. As defined by UNICEF, multidimensional child poverty covers six dimensions and it can be calculated from the face-to-face Demographic and Health Surveys (DHS) Program . As part of the benchmark, we test spatial as well as temporal generalization, by testing on unseen locations, and on data after the training years. Using our dataset we benchmark multiple models, from low-level satellite imagery models such as MOSAIKS , to deep learning foundation models, which include both generic vision models such as Self-Distillation with no Labels (DINOv2) models and specific satellite imagery models such as SatMAE. We provide open source code for building the satellite dataset, obtaining ground truth data from DHS and running various models assessed in our work.

Paper Structure (34 sections, 1 figure, 2 tables)

This paper contains 34 sections, 1 figure, 2 tables.

Introduction
Related Work
Existing Satellite Imagery Datasets
Satellite Imagery for Demographic and Health Indicators
Foundation Satellite Image Models
Dataset
Satellite Images
Demographic Health Surveys and Child Poverty
Benchmark
Spatial
Temporal
Models to be Compared
Evaluation and Fine-tuning
Results
Spatial Benchmark
...and 19 more sections

Figures (1)

Figure 1: Estimates of the prevalence of severe deprivation for Kenya in 2022. (a) shows predictions using a spatial statistics approach, kriging on the cluster locations using Kenya DHS 2022 data only with a spherical variogram. (b) shows predictions from DinoV2 fine-tuned on the KidSat spatial dataset, in which 20% of all clusters in Eastern and Southern Africa were held out. (c) shows predictions from DinoV2 fine-tuned on the KidSat temporal dataset, in which the training data was from before 2020.

KidSat: satellite imagery to map childhood poverty dataset and benchmark

TL;DR

Abstract

KidSat: satellite imagery to map childhood poverty dataset and benchmark

Authors

TL;DR

Abstract

Table of Contents

Figures (1)