Table of Contents
Fetching ...

CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification

Mateus de Souza Miranda, Ronny Hänsch, Valdivino Alexandre de Santiago Júnior, Thales Sehn Körting, Erison Carlos dos Santos Monteiro

TL;DR

CerraData-4MM introduces a multimodal 10 m resolution benchmark for land use and land cover classification in the Cerrado's Bico do Papagaio ecoregion, combining Sentinel-1 SAR and Sentinel-2 MSI with a two-level hierarchical label structure. It provides 30,322 patches (128×128) and TerraClass Cerrado 2022 reference maps plus edge information to enable semantic and edge segmentation tasks. Baseline experiments using U-Net and a Vision Transformer-based TransNuSeg show that multimodal fusion and ViT architectures yield superior L1 performance (macro F1 ~57.60% and mIoU ~49.05%), while L2 remains challenging due to class imbalance and subtle spectral differences; balancing helps minority classes but can reduce overall accuracy. The dataset is intended to drive advances in multimodal fusion, hierarchical learning, and imbalance-handling in real-world Cerrado mapping, with code, trained models, and data publicly available. Future work envisions expanding to additional Cerrado ecoregions and modalities (e.g., DEM) to support multi-task learning and broader applicability.

Abstract

The Cerrado faces increasing environmental pressures, necessitating accurate land use and land cover (LULC) mapping despite challenges such as class imbalance and visually similar categories. To address this, we present CerraData-4MM, a multimodal dataset combining Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Imagery (MSI) with 10m spatial resolution. The dataset includes two hierarchical classification levels with 7 and 14 classes, respectively, focusing on the diverse Bico do Papagaio ecoregion. We highlight CerraData-4MM's capacity to benchmark advanced semantic segmentation techniques by evaluating a standard U-Net and a more sophisticated Vision Transformer (ViT) model. The ViT achieves superior performance in multimodal scenarios, with the highest macro F1-score of 57.60% and a mean Intersection over Union (mIoU) of 49.05% at the first hierarchical level. Both models struggle with minority classes, particularly at the second hierarchical level, where U-Net's performance drops to an F1-score of 18.16%. Class balancing improves representation for underrepresented classes but reduces overall accuracy, underscoring the trade-off in weighted training. CerraData-4MM offers a challenging benchmark for advancing deep learning models to handle class imbalance and multimodal data fusion. Code, trained models, and data are publicly available at https://github.com/ai4luc/CerraData-4MM.

CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification

TL;DR

CerraData-4MM introduces a multimodal 10 m resolution benchmark for land use and land cover classification in the Cerrado's Bico do Papagaio ecoregion, combining Sentinel-1 SAR and Sentinel-2 MSI with a two-level hierarchical label structure. It provides 30,322 patches (128×128) and TerraClass Cerrado 2022 reference maps plus edge information to enable semantic and edge segmentation tasks. Baseline experiments using U-Net and a Vision Transformer-based TransNuSeg show that multimodal fusion and ViT architectures yield superior L1 performance (macro F1 ~57.60% and mIoU ~49.05%), while L2 remains challenging due to class imbalance and subtle spectral differences; balancing helps minority classes but can reduce overall accuracy. The dataset is intended to drive advances in multimodal fusion, hierarchical learning, and imbalance-handling in real-world Cerrado mapping, with code, trained models, and data publicly available. Future work envisions expanding to additional Cerrado ecoregions and modalities (e.g., DEM) to support multi-task learning and broader applicability.

Abstract

The Cerrado faces increasing environmental pressures, necessitating accurate land use and land cover (LULC) mapping despite challenges such as class imbalance and visually similar categories. To address this, we present CerraData-4MM, a multimodal dataset combining Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Imagery (MSI) with 10m spatial resolution. The dataset includes two hierarchical classification levels with 7 and 14 classes, respectively, focusing on the diverse Bico do Papagaio ecoregion. We highlight CerraData-4MM's capacity to benchmark advanced semantic segmentation techniques by evaluating a standard U-Net and a more sophisticated Vision Transformer (ViT) model. The ViT achieves superior performance in multimodal scenarios, with the highest macro F1-score of 57.60% and a mean Intersection over Union (mIoU) of 49.05% at the first hierarchical level. Both models struggle with minority classes, particularly at the second hierarchical level, where U-Net's performance drops to an F1-score of 18.16%. Class balancing improves representation for underrepresented classes but reduces overall accuracy, underscoring the trade-off in weighted training. CerraData-4MM offers a challenging benchmark for advancing deep learning models to handle class imbalance and multimodal data fusion. Code, trained models, and data are publicly available at https://github.com/ai4luc/CerraData-4MM.

Paper Structure

This paper contains 11 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The Bico do Papagaio, marked in brown with red edges, is the region selected to create the dataset.
  • Figure 2: CerraData-4MM's hierarchical level of classes and its samples.
  • Figure 3: Overall Performance (F1-score and mIoU).
  • Figure 4: Confusion matrix for weighted MSI+SAR L1 data-trained models.
  • Figure 5: Confusion matrix for weighted MSI+SAR L2 data-trained models.
  • ...and 2 more figures