Table of Contents
Fetching ...

Geospatial Machine Learning Libraries

Adam J. Stewart, Caleb Robinson, Arindam Banerjee

TL;DR

This chapter surveys the geospatial machine learning (GeoML) library landscape, detailing how domain-specific tools address geospatial data challenges that standard ML pipelines struggle with. It analyzes the evolution of GeoML tooling, with in-depth looks at TorchGeo, eo-learn, and Raster Vision, and demonstrates practical workflows through a crop-type mapping case study. The discussion covers data formats, benchmarking, licensing, CI, and governance, highlighting both progress and persistent bottlenecks in reproducibility and scalability. Looking ahead, the chapter emphasizes the emergence of foundation models and embeddings, reuse and ease-of-use improvements, and independent governance as shaping forces for a more interoperable and sustainable GeoML ecosystem.

Abstract

Recent advances in machine learning have been supported by the emergence of domain-specific software libraries, enabling streamlined workflows and increased reproducibility. For geospatial machine learning (GeoML), the availability of Earth observation data has outpaced the development of domain libraries to handle its unique challenges, such as varying spatial resolutions, spectral properties, temporal cadence, data coverage, coordinate systems, and file formats. This chapter presents a comprehensive overview of GeoML libraries, analyzing their evolution, core functionalities, and the current ecosystem. It also introduces popular GeoML libraries such as TorchGeo, eo-learn, and Raster Vision, detailing their architecture, supported data types, and integration with ML frameworks. Additionally, it discusses common methodologies for data preprocessing, spatial--temporal joins, benchmarking, and the use of pretrained models. Through a case study in crop type mapping, it demonstrates practical applications of these tools. Best practices in software design, licensing, and testing are highlighted, along with open challenges and future directions, particularly the rise of foundation models and the need for governance in open-source geospatial software. Our aim is to guide practitioners, developers, and researchers in navigating and contributing to the rapidly evolving GeoML landscape.

Geospatial Machine Learning Libraries

TL;DR

This chapter surveys the geospatial machine learning (GeoML) library landscape, detailing how domain-specific tools address geospatial data challenges that standard ML pipelines struggle with. It analyzes the evolution of GeoML tooling, with in-depth looks at TorchGeo, eo-learn, and Raster Vision, and demonstrates practical workflows through a crop-type mapping case study. The discussion covers data formats, benchmarking, licensing, CI, and governance, highlighting both progress and persistent bottlenecks in reproducibility and scalability. Looking ahead, the chapter emphasizes the emergence of foundation models and embeddings, reuse and ease-of-use improvements, and independent governance as shaping forces for a more interoperable and sustainable GeoML ecosystem.

Abstract

Recent advances in machine learning have been supported by the emergence of domain-specific software libraries, enabling streamlined workflows and increased reproducibility. For geospatial machine learning (GeoML), the availability of Earth observation data has outpaced the development of domain libraries to handle its unique challenges, such as varying spatial resolutions, spectral properties, temporal cadence, data coverage, coordinate systems, and file formats. This chapter presents a comprehensive overview of GeoML libraries, analyzing their evolution, core functionalities, and the current ecosystem. It also introduces popular GeoML libraries such as TorchGeo, eo-learn, and Raster Vision, detailing their architecture, supported data types, and integration with ML frameworks. Additionally, it discusses common methodologies for data preprocessing, spatial--temporal joins, benchmarking, and the use of pretrained models. Through a case study in crop type mapping, it demonstrates practical applications of these tools. Best practices in software design, licensing, and testing are highlighted, along with open challenges and future directions, particularly the rise of foundation models and the need for governance in open-source geospatial software. Our aim is to guide practitioners, developers, and researchers in navigating and contributing to the rapidly evolving GeoML landscape.

Paper Structure

This paper contains 22 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Timeline of GeoML library development. Libraries that are under active development (defined as a new commit within the last year) are shown in black. Inactive libraries are shown in red. SPy and OTB development stretches back to 2001 and 2006, respectively, and are truncated to focus on more recent developments. Bars denote earliest and most recent commit (as of September 2025).
  • Figure 2: A common use case for GeoML practitioners is joining raster data, such as satellite imagery (top left), with vector data, such as crop masks (bottom left), then randomly sampling patches from the intersection of these datasets to use in modeling pipelines (right).