Table of Contents
Fetching ...

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Diego Velazquez, Pau Rodriguez López, Sergio Alonso, Josep M. Gonfaus, Jordi Gonzalez, Gerardo Richarte, Javier Marin, Yoshua Bengio, Alexandre Lacoste

TL;DR

EarthView addresses the need for scalable, unlabeled data in remote sensing by integrating Satellogic, Sentinel, and NEON imagery into a 15-terapixel dataset spanning 2017–2022. The authors introduce EarthMAE, a time- and source-aware masked autoencoder designed to learn from heterogeneous multi-sensor data with diverse masking strategies and temporal encodings. Key findings show that pre-training with Satellogic data, especially when combined with Sentinel data, yields consistent downstream gains, and that incorporating temporality and specialized masking strategies is crucial for performance. The dataset and model together create an open, scalable platform to study self-supervised learning for Earth monitoring, enabling broader access and paving the way for larger, more capable foundation models in Earth observation.

Abstract

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

TL;DR

EarthView addresses the need for scalable, unlabeled data in remote sensing by integrating Satellogic, Sentinel, and NEON imagery into a 15-terapixel dataset spanning 2017–2022. The authors introduce EarthMAE, a time- and source-aware masked autoencoder designed to learn from heterogeneous multi-sensor data with diverse masking strategies and temporal encodings. Key findings show that pre-training with Satellogic data, especially when combined with Sentinel data, yields consistent downstream gains, and that incorporating temporality and specialized masking strategies is crucial for performance. The dataset and model together create an open, scalable platform to study self-supervised learning for Earth monitoring, enabling broader access and paving the way for larger, more capable foundation models in Earth observation.

Abstract

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.
Paper Structure (43 sections, 10 figures, 3 tables)

This paper contains 43 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Different masking schemes explored in our work. Random masking, masks random patches across sources/time while tube masking masks the same patches. Combined masking combines both by first masking some patches consistently across sources/time and then randomly masking a subset of the remaining ones.
  • Figure 2: Samples from the dataset
  • Figure 3: Spatial coverage for each source. Note that a colored area may contain multiple patches.
  • Figure 4: Temporal distribution of the dataset. NEON data only provides the year, and Satellogic data does not contain the time of the day.
  • Figure 5: EarthMAE: The model leverages time information and can digest data from an arbitrary number of sources. Each input source is tokenized into a fixed number of patches and then all patches are concatenated. The time, source, and positional encodings are concatenated and added to the patches.
  • ...and 5 more figures