Table of Contents
Fetching ...

Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning

Esther Rolf, Konstantin Klemmer, Caleb Robinson, Hannah Kerner

TL;DR

This position paper argues that satellite data form a distinct ML modality, not adequately served by lift-and-shift approaches borrowed from natural images or text. It outlines the unique characteristics of SatML—logarithmic spatial/temporal scales, diverse spectral channels, massive volumes, and sparse annotations—and highlights deployment, evaluation, and ethical challenges that demand specialized methods. The authors advocate for SatML-specific learning strategies, architectures, and explicit domain-context modeling, and discuss how SatML can enrich broader ML research through distribution shift, SSL, multi-modal learning, and new positional encodings. They further call for community coordination, benchmarks tied to real-world impact, and governance to ensure global and local benefits. Together, these points aim to elevate SatML from an application area to a standalone, responsible, and impactful research discipline.

Abstract

Satellite data has the potential to inspire a seismic shift for machine learning -- one in which we rethink existing practices designed for traditional data modalities. As machine learning for satellite data (SatML) gains traction for its real-world impact, our field is at a crossroads. We can either continue applying ill-suited approaches, or we can initiate a new research agenda that centers around the unique characteristics and challenges of satellite data. This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society.

Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning

TL;DR

This position paper argues that satellite data form a distinct ML modality, not adequately served by lift-and-shift approaches borrowed from natural images or text. It outlines the unique characteristics of SatML—logarithmic spatial/temporal scales, diverse spectral channels, massive volumes, and sparse annotations—and highlights deployment, evaluation, and ethical challenges that demand specialized methods. The authors advocate for SatML-specific learning strategies, architectures, and explicit domain-context modeling, and discuss how SatML can enrich broader ML research through distribution shift, SSL, multi-modal learning, and new positional encodings. They further call for community coordination, benchmarks tied to real-world impact, and governance to ensure global and local benefits. Together, these points aim to elevate SatML from an application area to a standalone, responsible, and impactful research discipline.

Abstract

Satellite data has the potential to inspire a seismic shift for machine learning -- one in which we rethink existing practices designed for traditional data modalities. As machine learning for satellite data (SatML) gains traction for its real-world impact, our field is at a crossroads. We can either continue applying ill-suited approaches, or we can initiate a new research agenda that centers around the unique characteristics and challenges of satellite data. This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society.
Paper Structure (25 sections, 5 figures, 1 table)

This paper contains 25 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Satellite images of the same location can vary widely depending on factors like spatial resolution and cropping extent, temporal dimension, and satellite mission or instrument. ML methods that leverage these factors can drastically outperform methods for general images.
  • Figure 2: In SatML, multiple observations and multiple (or no) labels may correspond to a given (lat, lon, time) index, whereas in many ML settings, labels are defined directly from images.
  • Figure 3: SatML has distinct considerations for deployment and evaluation. Deployment datasets are often dense, and much larger than training datasets. Spatio-temporal covariate shifts necessitate spatially aware model validation for out-of-sample model deployment. Figure adapted in part from zvonkov2023openmapflow.
  • Figure 4: SatML can enrich many research areas in ML, e.g., multi-modal, self-supervised, and distributionally robust learning.
  • Figure 5: Developing priorities and guidelines for SatML research connects many communities, stakeholders, and disciplines.