A deep multiple instance learning approach based on coarse labels for high-resolution land-cover mapping
Gianmarco Perantoni, Lorenzo Bruzzone
TL;DR
Problem: high-resolution land-cover mapping from coarse low-resolution references is hindered by label noise and scale mismatch. Approach: a PU-MIL based DMIL framework (PU-DMIL) learns pixel-level multi-class predictions and patch-level labels using attention-based MIL pooling and nnPU risk to handle unlabeled instances; it supports two formulations—$R_{MC}$ for multi-class dominant-label and $R_{ML}$ with nnPU risk for multi-label, combined as $\mathcal{R} = \beta R_{MC} + (1-\beta) R_{ML}$—and requires class-prior probabilities $\pi_i$. Contributions: introduces a novel PU-MIL loss design, multiple DMIL pooling variants (Mean, LSE, Attn, GAttn, Prop), and shows improvements in $AA$ and $mIoU$ on the DFC2020 dataset over a standard training baseline. Finally, the work demonstrates the practical potential to map HR land-cover from coarse LR maps at scale, with future directions in domain adaptation and handling class imbalance within a weak supervision framework.
Abstract
The quantity and the quality of the training labels are central problems in high-resolution land-cover mapping with machine-learning-based solutions. In this context, weak labels can be gathered in large quantities by leveraging on existing low-resolution or obsolete products. In this paper, we address the problem of training land-cover classifiers using high-resolution imagery (e.g., Sentinel-2) and weak low-resolution reference data (e.g., MODIS -derived land-cover maps). Inspired by recent works in Deep Multiple Instance Learning (DMIL), we propose a method that trains pixel-level multi-class classifiers and predicts low-resolution labels (i.e., patch-level classification), where the actual high-resolution labels are learned implicitly without direct supervision. This is achieved with flexible pooling layers that are able to link the semantics of the pixels in the high-resolution imagery to the low-resolution reference labels. Then, the Multiple Instance Learning (MIL) problem is re-framed in a multi-class and in a multi-label setting. In the former, the low-resolution annotation represents the majority of the pixels in the patch. In the latter, the annotation only provides us information on the presence of one of the land-cover classes in the patch and thus multiple labels can be considered valid for a patch at a time, whereas the low-resolution labels provide us only one label. Therefore, the classifier is trained with a Positive-Unlabeled Learning (PUL) strategy. Experimental results on the 2020 IEEE GRSS Data Fusion Contest dataset show the effectiveness of the proposed framework compared to standard training strategies.
