Table of Contents
Fetching ...

Earth Embeddings Reveal Diverse Urban Signals from Space

Wenjing Gong, Udbhav Srivastava, Yuchen Wang, Yuhao Jia, Qifan Wu, Weishan Bai, Yifan Yang, Xiao Huang, Xinyue Ye

Abstract

Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.

Earth Embeddings Reveal Diverse Urban Signals from Space

Abstract

Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.

Paper Structure

This paper contains 17 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Study area and research framework. (a) The selected six MSAs: Atlanta–Sandy Springs–Alpharetta, GA (Atlanta), Chicago–Naperville–Elgin, IL–IN–WI (Chicago), Houston–The Woodlands–Sugar Land, TX (Houston), Los Angeles–Long Beach–Anaheim, CA (Los Angeles), New York–Newark–Jersey City, NY–NJ (New York), and Seattle–Tacoma–Bellevue, WA (Seattle). (b) Land area (km²) of the six MSAs. (c) Total population from 2020 to 2023 for each MSA. (d) Conceptual framework of Earth-embedding-based urban signal prediction.
  • Figure 2: Benchmarking Earth embeddings for predicting urban signals. (a) Global predictive performance (test R²) of three Earth embeddings (AlphaEarth, Prithvi, and Clay) across 14 urban indicators spanning crime, income, health, and travel behavior for six US MSAs over 2020–2023 (see Supplementary Table 2 for details). (b) Global mean R² by thematic domain, obtained by averaging across indicators, cities, and years, summarizing domain-level differences in model skill. (c) Distribution of city–year experiment test R² values across all indicators, showing the variability and upper-tail behavior of each embedding.
  • Figure 3: Cross-city heterogeneity in the predictability of urban signals. (a) City-wise predictive performance (test R²) of three Earth embeddings (AlphaEarth, Prithvi, and Clay) across four domains for six US MSAs over 2020–2023. (b) Hierarchical clustering of MSAs based on AlphaEarth’s mean R² across domains. (c–e) Exploratory relationships between urban form indicators and domain-specific AlphaEarth predictive performance: (c) population density, (d) employment and household entropy, and (e) walkability index. Colored lines show simple domain-specific trend lines, and $\rho$ indicates Spearman’s rank correlation.
  • Figure 4: Consistency of urban signal prediction across years (2020–2023). Annual predictive performance (test R²) for four urban domains (crime, income, health, and travel) from 2020 to 2023, aggregated across all indicators and MSAs for (a) AlphaEarth, (b) Prithvi, and (c) Clay. Lines show the domain-wise mean R² in each year, and shaded bands indicate variability across indicators.
  • Figure 5: Global information density and representation efficiency of Earth embeddings. Mean predictive performance (test R²) by domain (crime, income, health, and travel) for (a) Prithvi and (b) Clay using their original high-dimensional embeddings and five 64-d compressed variants (FA-64, Isomap-64, kPCA-64, PCA-64, and RP-64). Bars show domain-wise mean R² aggregated across all indicators, MSAs, and years. The dashed line with stars denotes the original 64-d AlphaEarth performance.