Table of Contents
Fetching ...

SSL4EO-S12 v1.1: A Multimodal, Multiseasonal Dataset for Pretraining, Updated

Benedikt Blumenstiel, Nassim Ait Ali Braham, Conrad M Albrecht, Stefano Maurogiovanni, Paolo Fraccaro

TL;DR

SSL4EO-S12 v1.1 tackles misalignment and data readiness challenges in the prior release by aligning Sentinel-1 to Sentinel-2 coordinates over larger areas and delivering analysis-ready, multi-temporal, multimodal data. The approach combines four-season, urban-centric sampling within 50 km of the world’s 10,000 most populated cities, multiple modalities (S-1 GRD, S-2 L1C/L2A, RGB), and a robust preprocessing pipeline (UTM-based reprojection, NaN imputation, SEnSeI v2 cloud masking) into ARD-ready Zarr files. The dataset contains 246,144 locations and 984,576 samples distributed across 3,846 Zarr Zip files, enabling efficient loading for large foundation-model pretraining via self-supervised learning. By licensing under CC-BY-4.0, the dataset lowers barriers for open research and supports future advances in EO foundation models and geospatial analysis.

Abstract

This technical report presents SSL4EO-S12 v1.1, a multimodal, multitemporal Earth Observation dataset designed for pretraining large-scale foundation models. Building on the success of SSL4EO-S12 v1.0, the new version addresses the previous challenges of data misalignment and a limited data structure for low-barrier, analysis-ready EO processing. SSL4EO-S12 v1.1 covers the world's 10,000 largest cities and its surroundings within a 50 km radius across four seasons, resulting in a diverse collection of nearly one million patches. SSL4EO-S12 v1.1 packages the data in Zarr file format for cloud-efficient loading and representation of meta-information such as including cloud masks and geolocation. Released under the CC-BY-4.0 license, SSL4EO-S12 v1.1 facilitates open research and provides a robust foundation for future advancements in self-supervised learning and geospatial analysis. The dataset is available online through https://datapub.fz-juelich.de/ssl4eo-s12, and we provided additional resources at https://github.com/DLR-MF-DAS/SSL4EO-S12-v1.1.

SSL4EO-S12 v1.1: A Multimodal, Multiseasonal Dataset for Pretraining, Updated

TL;DR

SSL4EO-S12 v1.1 tackles misalignment and data readiness challenges in the prior release by aligning Sentinel-1 to Sentinel-2 coordinates over larger areas and delivering analysis-ready, multi-temporal, multimodal data. The approach combines four-season, urban-centric sampling within 50 km of the world’s 10,000 most populated cities, multiple modalities (S-1 GRD, S-2 L1C/L2A, RGB), and a robust preprocessing pipeline (UTM-based reprojection, NaN imputation, SEnSeI v2 cloud masking) into ARD-ready Zarr files. The dataset contains 246,144 locations and 984,576 samples distributed across 3,846 Zarr Zip files, enabling efficient loading for large foundation-model pretraining via self-supervised learning. By licensing under CC-BY-4.0, the dataset lowers barriers for open research and supports future advances in EO foundation models and geospatial analysis.

Abstract

This technical report presents SSL4EO-S12 v1.1, a multimodal, multitemporal Earth Observation dataset designed for pretraining large-scale foundation models. Building on the success of SSL4EO-S12 v1.0, the new version addresses the previous challenges of data misalignment and a limited data structure for low-barrier, analysis-ready EO processing. SSL4EO-S12 v1.1 covers the world's 10,000 largest cities and its surroundings within a 50 km radius across four seasons, resulting in a diverse collection of nearly one million patches. SSL4EO-S12 v1.1 packages the data in Zarr file format for cloud-efficient loading and representation of meta-information such as including cloud masks and geolocation. Released under the CC-BY-4.0 license, SSL4EO-S12 v1.1 facilitates open research and provides a robust foundation for future advancements in self-supervised learning and geospatial analysis. The dataset is available online through https://datapub.fz-juelich.de/ssl4eo-s12, and we provided additional resources at https://github.com/DLR-MF-DAS/SSL4EO-S12-v1.1.

Paper Structure

This paper contains 4 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example patches from SSL4EO-S12 v1.1 showing four columns with timestamps of Sentinel-2 (S-2) L2A (left) and Sentinel-1 (S-1) GRD (right) products. S-1 GRD is visualized using a VH-VV-VH pseudo coloring. VV is scaled from -30 – 5 db while VH is scaled from -40 – 0 db to account for their value ranges.
  • Figure 2: Global distribution of SSL4EO-S12 v1.1 training (green) and validation (magenta) center points (not to scale).