Table of Contents
Fetching ...

A deep multiple instance learning approach based on coarse labels for high-resolution land-cover mapping

Gianmarco Perantoni, Lorenzo Bruzzone

TL;DR

Problem: high-resolution land-cover mapping from coarse low-resolution references is hindered by label noise and scale mismatch. Approach: a PU-MIL based DMIL framework (PU-DMIL) learns pixel-level multi-class predictions and patch-level labels using attention-based MIL pooling and nnPU risk to handle unlabeled instances; it supports two formulations—$R_{MC}$ for multi-class dominant-label and $R_{ML}$ with nnPU risk for multi-label, combined as $\mathcal{R} = \beta R_{MC} + (1-\beta) R_{ML}$—and requires class-prior probabilities $\pi_i$. Contributions: introduces a novel PU-MIL loss design, multiple DMIL pooling variants (Mean, LSE, Attn, GAttn, Prop), and shows improvements in $AA$ and $mIoU$ on the DFC2020 dataset over a standard training baseline. Finally, the work demonstrates the practical potential to map HR land-cover from coarse LR maps at scale, with future directions in domain adaptation and handling class imbalance within a weak supervision framework.

Abstract

The quantity and the quality of the training labels are central problems in high-resolution land-cover mapping with machine-learning-based solutions. In this context, weak labels can be gathered in large quantities by leveraging on existing low-resolution or obsolete products. In this paper, we address the problem of training land-cover classifiers using high-resolution imagery (e.g., Sentinel-2) and weak low-resolution reference data (e.g., MODIS -derived land-cover maps). Inspired by recent works in Deep Multiple Instance Learning (DMIL), we propose a method that trains pixel-level multi-class classifiers and predicts low-resolution labels (i.e., patch-level classification), where the actual high-resolution labels are learned implicitly without direct supervision. This is achieved with flexible pooling layers that are able to link the semantics of the pixels in the high-resolution imagery to the low-resolution reference labels. Then, the Multiple Instance Learning (MIL) problem is re-framed in a multi-class and in a multi-label setting. In the former, the low-resolution annotation represents the majority of the pixels in the patch. In the latter, the annotation only provides us information on the presence of one of the land-cover classes in the patch and thus multiple labels can be considered valid for a patch at a time, whereas the low-resolution labels provide us only one label. Therefore, the classifier is trained with a Positive-Unlabeled Learning (PUL) strategy. Experimental results on the 2020 IEEE GRSS Data Fusion Contest dataset show the effectiveness of the proposed framework compared to standard training strategies.

A deep multiple instance learning approach based on coarse labels for high-resolution land-cover mapping

TL;DR

Problem: high-resolution land-cover mapping from coarse low-resolution references is hindered by label noise and scale mismatch. Approach: a PU-MIL based DMIL framework (PU-DMIL) learns pixel-level multi-class predictions and patch-level labels using attention-based MIL pooling and nnPU risk to handle unlabeled instances; it supports two formulations— for multi-class dominant-label and with nnPU risk for multi-label, combined as —and requires class-prior probabilities . Contributions: introduces a novel PU-MIL loss design, multiple DMIL pooling variants (Mean, LSE, Attn, GAttn, Prop), and shows improvements in and on the DFC2020 dataset over a standard training baseline. Finally, the work demonstrates the practical potential to map HR land-cover from coarse LR maps at scale, with future directions in domain adaptation and handling class imbalance within a weak supervision framework.

Abstract

The quantity and the quality of the training labels are central problems in high-resolution land-cover mapping with machine-learning-based solutions. In this context, weak labels can be gathered in large quantities by leveraging on existing low-resolution or obsolete products. In this paper, we address the problem of training land-cover classifiers using high-resolution imagery (e.g., Sentinel-2) and weak low-resolution reference data (e.g., MODIS -derived land-cover maps). Inspired by recent works in Deep Multiple Instance Learning (DMIL), we propose a method that trains pixel-level multi-class classifiers and predicts low-resolution labels (i.e., patch-level classification), where the actual high-resolution labels are learned implicitly without direct supervision. This is achieved with flexible pooling layers that are able to link the semantics of the pixels in the high-resolution imagery to the low-resolution reference labels. Then, the Multiple Instance Learning (MIL) problem is re-framed in a multi-class and in a multi-label setting. In the former, the low-resolution annotation represents the majority of the pixels in the patch. In the latter, the annotation only provides us information on the presence of one of the land-cover classes in the patch and thus multiple labels can be considered valid for a patch at a time, whereas the low-resolution labels provide us only one label. Therefore, the classifier is trained with a Positive-Unlabeled Learning (PUL) strategy. Experimental results on the 2020 IEEE GRSS Data Fusion Contest dataset show the effectiveness of the proposed framework compared to standard training strategies.

Paper Structure

This paper contains 9 sections, 14 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Architecture of the baseline and the proposed methods. (a) Baseline architecture, where the LR label is up-sampled and used as it was an HR label. (b) Proposed architecture, with a DMIL module that produces both high- and low- resolution predictions of the land cover, allowing to compute the loss directly at the resolution of the training label.
  • Figure 2: Architecture of the proposed DMIL module. The attention layer generates the class bag representations that are classified with the same classifiers used to generate the HR predictions.
  • Figure 3: Qualitative example of the classification maps obtained by the considered models: (a) Sentinel-2 RGB image, (b) Reference HR land-cover map, (c) Reference LR land-cover map used for training. Classification maps obtained by: (d) the standard model, (e) the Mean PU-DMIL model, (f) the Log-Sum-Exp PU-DMIL model, (g) the Attention PU-DMIL model, (h) Gated Attention PU-DMIL model, (i) the proposed alternative to the Gated Attention PU-DMIL model.
  • Figure 4: Qualitative example of the classification maps obtained by the considered models: (a) Sentinel-2 RGB image, (b) Reference HR land-cover map, (c) Reference LR land-cover map used for training. Classification maps obtained by: (d) the standard model, (e) the Mean PU-DMIL model, (f) the Log-Sum-Exp PU-DMIL model, (g) the Attention PU-DMIL model, (h) Gated Attention PU-DMIL model, (i) the proposed alternative to the Gated Attention PU-DMIL model.