Table of Contents
Fetching ...

Supervised and self-supervised land-cover segmentation & classification of the Biesbosch wetlands

Eva Gmelich Meijling, Roberto Del Prete, Arnoud Visser

TL;DR

The paper tackles wetland land-cover classification under annotated-data scarcity by combining supervised learning with self-supervised pretraining via an autoencoder. A U-Net trained from scratch on Sentinel-2 data achieves $85.26\%$ accuracy, while SSL pretraining improves high-resolution results to $88.23\%$. It also introduces a framework to scale manually annotated high-resolution labels to medium-resolution inputs and releases a Sentinel-2 dataset with Dynamic World labels to support reproducibility, highlighting that high-resolution imagery yields sharper segmentation boundaries and finer detail when labels are available.

Abstract

Accurate wetland land-cover classification is essential for environmental monitoring, biodiversity assessment, and sustainable ecosystem management. However, the scarcity of annotated data, especially for high-resolution satellite imagery, poses a significant challenge for supervised learning approaches. To tackle this issue, this study presents a methodology for wetland land-cover segmentation and classification that adopts both supervised and self-supervised learning (SSL). We train a U-Net model from scratch on Sentinel-2 imagery across six wetland regions in the Netherlands, achieving a baseline model accuracy of 85.26%. Addressing the limited availability of labeled data, the results show that SSL pretraining with an autoencoder can improve accuracy, especially for the high-resolution imagery where it is more difficult to obtain labeled data, reaching an accuracy of 88.23%. Furthermore, we introduce a framework to scale manually annotated high-resolution labels to medium-resolution inputs. While the quantitative performance between resolutions is comparable, high-resolution imagery provides significantly sharper segmentation boundaries and finer spatial detail. As part of this work, we also contribute a curated Sentinel-2 dataset with Dynamic World labels, tailored for wetland classification tasks and made publicly available.

Supervised and self-supervised land-cover segmentation & classification of the Biesbosch wetlands

TL;DR

The paper tackles wetland land-cover classification under annotated-data scarcity by combining supervised learning with self-supervised pretraining via an autoencoder. A U-Net trained from scratch on Sentinel-2 data achieves accuracy, while SSL pretraining improves high-resolution results to . It also introduces a framework to scale manually annotated high-resolution labels to medium-resolution inputs and releases a Sentinel-2 dataset with Dynamic World labels to support reproducibility, highlighting that high-resolution imagery yields sharper segmentation boundaries and finer detail when labels are available.

Abstract

Accurate wetland land-cover classification is essential for environmental monitoring, biodiversity assessment, and sustainable ecosystem management. However, the scarcity of annotated data, especially for high-resolution satellite imagery, poses a significant challenge for supervised learning approaches. To tackle this issue, this study presents a methodology for wetland land-cover segmentation and classification that adopts both supervised and self-supervised learning (SSL). We train a U-Net model from scratch on Sentinel-2 imagery across six wetland regions in the Netherlands, achieving a baseline model accuracy of 85.26%. Addressing the limited availability of labeled data, the results show that SSL pretraining with an autoencoder can improve accuracy, especially for the high-resolution imagery where it is more difficult to obtain labeled data, reaching an accuracy of 88.23%. Furthermore, we introduce a framework to scale manually annotated high-resolution labels to medium-resolution inputs. While the quantitative performance between resolutions is comparable, high-resolution imagery provides significantly sharper segmentation boundaries and finer spatial detail. As part of this work, we also contribute a curated Sentinel-2 dataset with Dynamic World labels, tailored for wetland classification tasks and made publicly available.

Paper Structure

This paper contains 9 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Sentinel-2 satellite image of the Biesbosch region with a conceptual land-cover classification overlay (artist impression by Rijkswaterstaat).
  • Figure 2: Schematic representation of the autoencoder architecture for 256×256 pixel input. The encoder reduces the spatial dimensions by a factor of 2 at each block, compressing the input from 256×256 pixels to 16×16 pixels in the bridge. At the same time, the number of channels increases by a factor of 2, from 64 in the first encoder block, to 512 in the bridge. The decoder restores the spatial dimensions back to 256×256 pixels.
  • Figure 3: Schematic representation of the U-Net architecture for 256×256 pixel input. The encoder reduces spatial dimensions to 1024 channels of 16×16 pixels in the bridge. Skip connections link corresponding encoder and decoder layers, helping to retain spatial information. The decoder reconstructs the segmentation map, restoring the spatial dimensions back to 256×256.
  • Figure 4: Wetland areas included in the Sentinel-2 dataset
  • Figure 5: Overview of the pre-processing steps applied to both Sentinel-2 and Pléiades NEO datasets. (a) Selection of relevant spectral bands tailored to the classification task. (b) Division of images into patches of 256 $\times$ 256 pixels for medium resolution, or 1024 $\times$ 1024 pixels for very high resolution, for consistency and usability in training. (c.1) Exclusion of patches with dimensions smaller than the set size or containing excessive black pixels ($>10\%$ for Sentinel-2 and $>30\%$ for Pléiades NEO). (c.2) Retention of patches that meet size and quality requirements to ensure consistency in high-resolution data.
  • ...and 5 more figures