Table of Contents
Fetching ...

Enhancing Self-Supervised Learning for Remote Sensing with Elevation Data: A Case Study with Scarce And High Level Semantic Labels

Omar A. Castaño-Idarraga, Raul Ramos-Pollán, Freddie Kalaitzis

TL;DR

This work tackles the challenge of learning powerful representations for remote sensing tasks when high-level semantic labels are scarce. It introduces elevation-aware pretext tasks that couple a pixel-level elevation map regression with contrastive self-supervised learning (SimCLR and GLCNet), enabling the backbone to better capture class-correlated information. On the NWRC dataset, SimCLR+Elevation improves image classification while GLCNet+Elevation enhances semantic segmentation, outperforming their respective baselines under limited labeled data. The authors provide open-source NWRC data and code, underscoring the practical potential of leveraging readily available elevation data and high-level labels to boost Earth observation models.

Abstract

This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks when only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps, which are commonly available worldwide. We hypothesize that this will allow the model to pre-learn useful representations, as there is generally some correlation between elevation maps and targets in many remote sensing tasks. We assess the performance of our approach on a binary semantic segmentation task and a binary image classification task, both derived from a dataset created for the northwest of Colombia. In both cases, we pre-train our models with 39k unlabeled images, fine-tune them on the downstream tasks with only 80 labeled images, and evaluate them with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation, and SimCLR+Elevation for classification, outperform their counterparts without the pixel-wise regression pre-text task, namely SimCLR and GLCNet, in terms of macro-average F1 Score and Mean Intersection over Union (MIoU). Our study not only encourages the development of pre-training methods that leverage readily available geographical information, such as elevation data, to enhance the performance of self-supervised methods when applied to Earth observation tasks, but also promotes the use of datasets with high-level semantic labels, which are more likely to be updated frequently. Project code can be found in this link \href{https://github.com/omarcastano/Elevation-Aware-SSL}{https://github.com/omarcastano/Elevation-Aware-SSL}.

Enhancing Self-Supervised Learning for Remote Sensing with Elevation Data: A Case Study with Scarce And High Level Semantic Labels

TL;DR

This work tackles the challenge of learning powerful representations for remote sensing tasks when high-level semantic labels are scarce. It introduces elevation-aware pretext tasks that couple a pixel-level elevation map regression with contrastive self-supervised learning (SimCLR and GLCNet), enabling the backbone to better capture class-correlated information. On the NWRC dataset, SimCLR+Elevation improves image classification while GLCNet+Elevation enhances semantic segmentation, outperforming their respective baselines under limited labeled data. The authors provide open-source NWRC data and code, underscoring the practical potential of leveraging readily available elevation data and high-level labels to boost Earth observation models.

Abstract

This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks when only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps, which are commonly available worldwide. We hypothesize that this will allow the model to pre-learn useful representations, as there is generally some correlation between elevation maps and targets in many remote sensing tasks. We assess the performance of our approach on a binary semantic segmentation task and a binary image classification task, both derived from a dataset created for the northwest of Colombia. In both cases, we pre-train our models with 39k unlabeled images, fine-tune them on the downstream tasks with only 80 labeled images, and evaluate them with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation, and SimCLR+Elevation for classification, outperform their counterparts without the pixel-wise regression pre-text task, namely SimCLR and GLCNet, in terms of macro-average F1 Score and Mean Intersection over Union (MIoU). Our study not only encourages the development of pre-training methods that leverage readily available geographical information, such as elevation data, to enhance the performance of self-supervised methods when applied to Earth observation tasks, but also promotes the use of datasets with high-level semantic labels, which are more likely to be updated frequently. Project code can be found in this link \href{https://github.com/omarcastano/Elevation-Aware-SSL}{https://github.com/omarcastano/Elevation-Aware-SSL}.
Paper Structure (24 sections, 7 equations, 5 figures, 2 tables)

This paper contains 24 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: a) Combined SimCLR+Elevation framework. b) Downstream classification using pre-trained backbone. c) Combined GLCNet+Elevation framework. d) Downstream segmentation using pre-trained backbone. Observe that in both (with SimCLR or with GLCNet) cases the Resnet18 encoder is shared during pre-training and it is only that encoder that is transferred to the downstream tasks. The rest of the architectural elements (the projection head $h_d$ and the decoder $d$) are initialized randomly.
  • Figure 2: Samples from the NWRC dataset used in the image classification task. The classification task involves predicting a label rather than a 2D segmentation map.
  • Figure 3: This Figure shows the experimental results of all models and the ground truth on the semantic segmentation dataset.
  • Figure 4: Results from the ablation analysis showcasing the impact of varying amounts of labeled data on the fine-tuning of pre-trained models for image classification.
  • Figure 5: Results from the ablation analysis illustrating the effect of different amounts of labeled data on the fine-tuning of pre-trained models for semantic segmentation.