Table of Contents
Fetching ...

Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts

Sumin Lee, Sungwon Park, Jeasurk Yang, Jihee Kim, Meeyoung Cha

TL;DR

GRAM tackles cross-region generalization in satellite-based slum segmentation by coupling a Mixture-of-Experts backbone with test-time adaptation. It learns region-specific adapters via adaptive routing, while enforcing universal representations through mutual-information regularization, and filters pseudo-labels with cross-region prediction consistency during target adaptation. Evaluated on a large, multi-continental dataset with three unseen African cities, GRAM outperforms state-of-the-art baselines, especially in low-resource settings, demonstrating label-efficient scalability for global slum monitoring. By enabling temporal tracking of informal settlements, GRAM provides actionable, data-driven insights to support urban policy and planning in data-scarce contexts.

Abstract

Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, hindering the ability of models trained on specific regions to generalize effectively to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions. We compile a million-scale satellite imagery dataset from 12 cities across four continents for source training. Using this dataset, the model employs a Mixture-of-Experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. GRAM outperforms state-of-the-art baselines in low-resource settings such as African cities, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.

Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts

TL;DR

GRAM tackles cross-region generalization in satellite-based slum segmentation by coupling a Mixture-of-Experts backbone with test-time adaptation. It learns region-specific adapters via adaptive routing, while enforcing universal representations through mutual-information regularization, and filters pseudo-labels with cross-region prediction consistency during target adaptation. Evaluated on a large, multi-continental dataset with three unseen African cities, GRAM outperforms state-of-the-art baselines, especially in low-resource settings, demonstrating label-efficient scalability for global slum monitoring. By enabling temporal tracking of informal settlements, GRAM provides actionable, data-driven insights to support urban policy and planning in data-scarce contexts.

Abstract

Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, hindering the ability of models trained on specific regions to generalize effectively to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions. We compile a million-scale satellite imagery dataset from 12 cities across four continents for source training. Using this dataset, the model employs a Mixture-of-Experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. GRAM outperforms state-of-the-art baselines in low-resource settings such as African cities, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.

Paper Structure

This paper contains 32 sections, 9 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Our source domain spans 12 cities across four continents; the target domain covers three African cities. Morphological similarities guide the model to prioritize Cape Town features for detecting slums in Dar es Salaam and Maputo (red), and Nairobi features for Kampala (blue).
  • Figure 2: Example of labeled satellite imagery used for training. (a) Original satellite image tile. (b) Corresponding binary mask overlaid in red, indicating slum areas (label=1), while black regions denote non-slum areas (label=0).
  • Figure 3: Overview of the Mixture-of-Experts (MoE) architecture in GRAM. The diagram illustrates the integration of lightweight MoE blocks $\mathcal{F}$ into the transformer encoder, with region-specific gating networks $g_d$ dynamically routing token features $z$ to a top-$k$ subset of expert adapters $\{\mathcal{E}_e\}_{e=1}^E$.
  • Figure 4: Jaccard similarity between the image sets of the three test cities (Dar es Salaam, Kampala, and Maputo) and those in the training set. Higher values indicate greater visual similarity in slum characteristics across cities. Red boxes denote the predictions made by the region classifier.
  • Figure 5: Slum segmentation results in Kampala in 2015 (yellow) and 2023 (red). Over the eight-year period, the slum ratio in the city increased from 8.4% to 8.6%.
  • ...and 1 more figures