Table of Contents
Fetching ...

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

Santiago Rivier, Carlos Hinojosa, Silvio Giancola, Bernard Ghanem

TL;DR

This work addresses the high annotation burden of semantic segmentation in aerial imagery by introducing a weakly supervised framework that takes user-provided query points and expands them to superpixels to form partial masks. It combines a point-to-superpixel strategy with a novel weighted masked loss, enabling end-to-end training across backbones such as DeepLabV3, FCN, and U-Net while maintaining competitive performance relative to fully supervised baselines. The method is validated on the LandCoverAI dataset, demonstrating effective segmentation with significantly reduced labeling effort and providing detailed ablations on point/ superpixel counts and loss design. The practical impact lies in enabling scalable, cost-efficient remote sensing analysis for urban planning, environmental monitoring, and disaster response, with code publicly available for reuse.

Abstract

Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models supervised with images partially labeled with the superpixel pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort. The code of our proposed approach is publicly available at: https://github.com/santiago2205/LSSQPS.

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

TL;DR

This work addresses the high annotation burden of semantic segmentation in aerial imagery by introducing a weakly supervised framework that takes user-provided query points and expands them to superpixels to form partial masks. It combines a point-to-superpixel strategy with a novel weighted masked loss, enabling end-to-end training across backbones such as DeepLabV3, FCN, and U-Net while maintaining competitive performance relative to fully supervised baselines. The method is validated on the LandCoverAI dataset, demonstrating effective segmentation with significantly reduced labeling effort and providing detailed ablations on point/ superpixel counts and loss design. The practical impact lies in enabling scalable, cost-efficient remote sensing analysis for urban planning, environmental monitoring, and disaster response, with code publicly available for reuse.

Abstract

Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models supervised with images partially labeled with the superpixel pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort. The code of our proposed approach is publicly available at: https://github.com/santiago2205/LSSQPS.
Paper Structure (11 sections, 2 equations, 6 figures, 3 tables)

This paper contains 11 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Proposed WSL Approach for Semantic Segmentation on Aerial Images. Our approach solely considers query point-based annotation on satellite images and relies on superpixel extraction to extend the point-based annotation into larger regions. We minimize our proposed masked loss by leveraging the generated partial mask pseudo-annotations, which provide more supervisory signals than the sole query point-based annotations.
  • Figure 2: Segmentation Performances per Number of Points. We show the performance with different numbers of points for each number of superpixels.
  • Figure 3: Segmentation Performances per Dataset Size. Our results demonstrate that larger datasets lead to improved performance. The experiment was conducted using a dataset of a total amount of $10,674$ images.
  • Figure 4: DAL-HERS vs SAM. Comparison of the DAL-HERS and SAM model outputs after labeling the images with points.
  • Figure 5: Qualitative Results on LandCoverAI. Top to bottom: Satellite Image, ground truth, fully supervised results with our pipeline (Ours FSL), and weakly supervised results (Our segmentation). Our proposed WSL approach presents visual results comparable to more expensive FSL setups.
  • ...and 1 more figures