EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis
Matthew Massey, Abdullah-Al-Zubaer Imran
TL;DR
EarthScape introduces a first AI-ready, multimodal dataset tailored for surficial geologic mapping and Earth surface analysis by fusing high-resolution RGB and NIR imagery, DEMs, terrain features across multiple scales, and GIS vector data for hydrology and infrastructure. The dataset comprises 31,018 patches across Warren and Hardin counties, annotated with seven geologic classes and 38 input channels, enabling multilabel classification and potential segmentation tasks. Baseline experiments with SGMap-Net show that DEM and elevation-derived features provide strong in-domain performance, while cross-domain generalization remains challenging and multimodal Early Fusion can underperform in unseen regions, underscoring the need for domain-robust fusion strategies. EarthScape is designed as a living benchmark to spur multimodal learning, domain adaptation, and high-resolution geospatial analysis in geosciences with broad potential for future expansion and pretrained regional models.
Abstract
Surficial geologic mapping is essential for understanding Earth surface processes, addressing modern challenges such as climate change and national security, and supporting common applications in engineering and resource management. However, traditional mapping methods are labor-intensive, limiting spatial coverage and introducing potential biases. To address these limitations, we introduce EarthScape, a novel, AI-ready multimodal dataset specifically designed for surficial geologic mapping and Earth surface analysis. EarthScape integrates high-resolution aerial RGB and near-infrared (NIR) imagery, digital elevation models (DEM), multi-scale DEM-derived terrain features, and hydrologic and infrastructure vector data. The dataset provides detailed annotations for seven distinct surficial geologic classes encompassing various geological processes. We present a comprehensive data processing pipeline using open-sourced raw data and establish baseline benchmarks using different spatial modalities to demonstrate the utility of EarthScape. As a living dataset with a vision for expansion, EarthScape bridges the gap between computer vision and Earth sciences, offering a valuable resource for advancing research in multimodal learning, geospatial analysis, and geological mapping. Our code is available at https://github.com/masseygeo/earthscape.
