Learning Semantic Segmentation with Query Points Supervision on Aerial Images
Santiago Rivier, Carlos Hinojosa, Silvio Giancola, Bernard Ghanem
TL;DR
This work addresses the high annotation burden of semantic segmentation in aerial imagery by introducing a weakly supervised framework that takes user-provided query points and expands them to superpixels to form partial masks. It combines a point-to-superpixel strategy with a novel weighted masked loss, enabling end-to-end training across backbones such as DeepLabV3, FCN, and U-Net while maintaining competitive performance relative to fully supervised baselines. The method is validated on the LandCoverAI dataset, demonstrating effective segmentation with significantly reduced labeling effort and providing detailed ablations on point/ superpixel counts and loss design. The practical impact lies in enabling scalable, cost-efficient remote sensing analysis for urban planning, environmental monitoring, and disaster response, with code publicly available for reuse.
Abstract
Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models supervised with images partially labeled with the superpixel pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort. The code of our proposed approach is publicly available at: https://github.com/santiago2205/LSSQPS.
