Table of Contents
Fetching ...

reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis

Kai Norman Clasen, Leonard Hackel, Tom Burgert, Gencer Sumbul, Begüm Demir, Volker Markl

TL;DR

The paper addresses the reliability and quality problems in large-scale remote sensing benchmarks like BigEarthNet by introducing reBEN, a refined dataset built from Sentinel-1/2 patches sized $1200 \mathrm{m} \times 1200 \mathrm{m}$. It reprocesses Sentinel-2 data with the latest atmospheric correction tool sen2cor v2.11 to level-2A, updates labeling using the 2018 CORINE Land Cover map, and overlays pixel-level reference maps to enable pixel- and scene-based learning. A geographical-based split reduces spatial correlation across train/validation/test, and supplementary software (rico-hdl) enables DL-friendly data formats with pre-trained weights released for reproducibility. The resulting 549,488 patch pairs, along with open code and tools, aim to provide more reliable, interpretable DL research for remote sensing image analysis and faster model training through optimized data formats.

Abstract

This paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset and then divide them into patches of size 1200 m x 1200 m. We apply atmospheric correction to the Sentinel-2 patches using the latest version of the sen2cor tool, resulting in higher-quality patches compared to those present in BigEarthNet. Each patch is then associated with a pixel-level reference map and scene-level multi-labels. This makes reBEN suitable for pixel- and scene-based learning tasks. The labels are derived from the most recent CORINE Land Cover (CLC) map of 2018 by utilizing the 19-class nomenclature as in BigEarthNet. The use of the most recent CLC map results in overcoming the label noise present in BigEarthNet. Furthermore, we introduce a new geographical-based split assignment algorithm that significantly reduces the spatial correlation among the train, validation, and test sets with respect to those present in BigEarthNet. This increases the reliability of the evaluation of DL models. To minimize the DL model training time, we introduce software tools that convert the reBEN dataset into a DL-optimized data format. In our experiments, we show the potential of reBEN for multi-modal multi-label image classification problems by considering several state-of-the-art DL models. The pre-trained model weights, associated code, and complete dataset are available at https://bigearth.net.

reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis

TL;DR

The paper addresses the reliability and quality problems in large-scale remote sensing benchmarks like BigEarthNet by introducing reBEN, a refined dataset built from Sentinel-1/2 patches sized . It reprocesses Sentinel-2 data with the latest atmospheric correction tool sen2cor v2.11 to level-2A, updates labeling using the 2018 CORINE Land Cover map, and overlays pixel-level reference maps to enable pixel- and scene-based learning. A geographical-based split reduces spatial correlation across train/validation/test, and supplementary software (rico-hdl) enables DL-friendly data formats with pre-trained weights released for reproducibility. The resulting 549,488 patch pairs, along with open code and tools, aim to provide more reliable, interpretable DL research for remote sensing image analysis and faster model training through optimized data formats.

Abstract

This paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset and then divide them into patches of size 1200 m x 1200 m. We apply atmospheric correction to the Sentinel-2 patches using the latest version of the sen2cor tool, resulting in higher-quality patches compared to those present in BigEarthNet. Each patch is then associated with a pixel-level reference map and scene-level multi-labels. This makes reBEN suitable for pixel- and scene-based learning tasks. The labels are derived from the most recent CORINE Land Cover (CLC) map of 2018 by utilizing the 19-class nomenclature as in BigEarthNet. The use of the most recent CLC map results in overcoming the label noise present in BigEarthNet. Furthermore, we introduce a new geographical-based split assignment algorithm that significantly reduces the spatial correlation among the train, validation, and test sets with respect to those present in BigEarthNet. This increases the reliability of the evaluation of DL models. To minimize the DL model training time, we introduce software tools that convert the reBEN dataset into a DL-optimized data format. In our experiments, we show the potential of reBEN for multi-modal multi-label image classification problems by considering several state-of-the-art DL models. The pre-trained model weights, associated code, and complete dataset are available at https://bigearth.net.
Paper Structure (7 sections, 3 figures, 1 table)

This paper contains 7 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Two example patches with the associated multi-labels from the BigEarthNet and reBEN datasets, where the correct labels are indicated by (C), wrong labels by (W) and missing labels by (M).
  • Figure 2: An example of pixel-level reference maps associated to different amounts of unlabeled pixels depicted in white: (a) shows a reference map that has a small number of unlabeled pixels whose associated patch is not removed; and (b) shows a reference map that has less than 75% of pixels annotated, thus its associated patch is excluded from the reBEN dataset.
  • Figure 3: Results of the BigEarthNet and reBEN split assignment algorithms on one of the 119 tiles. The patches of the training, validation and test sets are colored in blue, yellow, and green, respectively. The uncolored areas represent invalid patches. (a) shows the results obtained by the grid-based split assignment algorithm from BigEarthNet with patches from different sets within close spatial proximity to each other; and (b) shows the results obtained by the geographical-based split assignment algorithm of reBEN with larger distances between patches of different splits.