Table of Contents
Fetching ...

Learning with less: label-efficient land cover classification at very high spatial resolution using self-supervised deep learning

Dakota Hester, Vitor S. Martins, Lucas B. Ferreira, Thainara M. A. Lima

TL;DR

The paper tackles the data scarcity barrier in VHSR land cover mapping by employing BYOL self-supervised pretraining on a large corpus of unlabeled NAIP CIR imagery to learn a ResNet-101 encoder. This encoder is transferred to multiple semantic segmentation architectures and fine-tuned with only 1,000 labeled patches to produce a 1 m, 8-class land cover map for Mississippi, validated with 25,000 test points and a statewide ensemble of predictions. Across linear probing and end-to-end fine-tuning, the approach yields strong gains over ImageNet baselines, culminating in a final 1 m Mississippi product with macro F1 ≈ 75.6% and overall accuracy ≈ 87.1%, illustrating the practical potential of label-efficient VHSR mapping via self-supervised learning. The method provides a scalable blueprint for operational, high-resolution land cover mapping and highlights the value of in-domain pre-training, model ensembles, and cross-validation in data-scarce contexts.

Abstract

Deep learning semantic segmentation methods have shown promising performance for very high 1-m resolution land cover classification, but the challenge of collecting large volumes of representative training data creates a significant barrier to widespread adoption of such models for meter-scale land cover mapping over large areas. In this study, we present a novel label-efficient approach for statewide 1-m land cover classification using only 1,000 annotated reference image patches with self-supervised deep learning. We use the "Bootstrap Your Own Latent" pre-training strategy with a large amount of unlabeled color-infrared aerial images (377,921 256x256 1-m pixel patches) to pre-train a ResNet-101 convolutional encoder. The learned encoder weights were subsequently transferred into multiple deep semantic segmentation architectures (FCN, U-Net, Attention U-Net, DeepLabV3+, UPerNet, PAN), which were then fine-tuned using very small training dataset sizes with cross-validation (250, 500, 750 patches). Among the fine-tuned models, we obtained the 87.14% overall accuracy and 75.58% macro F1 score using an ensemble of the best performing U-Net models for comprehensive 1-m, 8-class land cover mapping, covering more than 123 billion pixels over the state of Mississippi, USA. Detailed qualitative and quantitative analysis revealed accurate mapping of open water and forested areas, while highlighting challenges in accurate delineation between cropland, herbaceous, and barren land cover types. These results show that self-supervised learning is an effective strategy for reducing the need for large volumes of manually annotated data, directly addressing a major limitation to high spatial resolution land cover mapping at scale.

Learning with less: label-efficient land cover classification at very high spatial resolution using self-supervised deep learning

TL;DR

The paper tackles the data scarcity barrier in VHSR land cover mapping by employing BYOL self-supervised pretraining on a large corpus of unlabeled NAIP CIR imagery to learn a ResNet-101 encoder. This encoder is transferred to multiple semantic segmentation architectures and fine-tuned with only 1,000 labeled patches to produce a 1 m, 8-class land cover map for Mississippi, validated with 25,000 test points and a statewide ensemble of predictions. Across linear probing and end-to-end fine-tuning, the approach yields strong gains over ImageNet baselines, culminating in a final 1 m Mississippi product with macro F1 ≈ 75.6% and overall accuracy ≈ 87.1%, illustrating the practical potential of label-efficient VHSR mapping via self-supervised learning. The method provides a scalable blueprint for operational, high-resolution land cover mapping and highlights the value of in-domain pre-training, model ensembles, and cross-validation in data-scarce contexts.

Abstract

Deep learning semantic segmentation methods have shown promising performance for very high 1-m resolution land cover classification, but the challenge of collecting large volumes of representative training data creates a significant barrier to widespread adoption of such models for meter-scale land cover mapping over large areas. In this study, we present a novel label-efficient approach for statewide 1-m land cover classification using only 1,000 annotated reference image patches with self-supervised deep learning. We use the "Bootstrap Your Own Latent" pre-training strategy with a large amount of unlabeled color-infrared aerial images (377,921 256x256 1-m pixel patches) to pre-train a ResNet-101 convolutional encoder. The learned encoder weights were subsequently transferred into multiple deep semantic segmentation architectures (FCN, U-Net, Attention U-Net, DeepLabV3+, UPerNet, PAN), which were then fine-tuned using very small training dataset sizes with cross-validation (250, 500, 750 patches). Among the fine-tuned models, we obtained the 87.14% overall accuracy and 75.58% macro F1 score using an ensemble of the best performing U-Net models for comprehensive 1-m, 8-class land cover mapping, covering more than 123 billion pixels over the state of Mississippi, USA. Detailed qualitative and quantitative analysis revealed accurate mapping of open water and forested areas, while highlighting challenges in accurate delineation between cropland, herbaceous, and barren land cover types. These results show that self-supervised learning is an effective strategy for reducing the need for large volumes of manually annotated data, directly addressing a major limitation to high spatial resolution land cover mapping at scale.

Paper Structure

This paper contains 19 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: 2023 color-infrared 1m NAIP imagery over the state of Mississippi, USA that was used in our study for statewide land cover classification.
  • Figure 2: High-level overview of the workflow used to develop, implement, and evaluate a label-efficient strategy for training deep semantic segmentation models for land cover classification over the state of Mississippi with scarce labeled data.
  • Figure 3: Locations of 1,000 sampled patches overlaid on NLCD land cover data for the state of Mississippi, USA (left), and the distribution of land cover classes within each fold (right).
  • Figure 4: Examples of ground truth samples and corresponding NAIP imagery used for model training.
  • Figure 5: BYOL pre-training step.
  • ...and 6 more figures