Table of Contents
Fetching ...

Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images

Lianlei Shan, Weiqiang Wang, Ke Lv, Bin Luo

TL;DR

The paper tackles the high annotation burden in aerial image semantic segmentation by introducing edge-guided labeling units and a fully class-balanced active learning framework. It combines an edge-focused labeling strategy, CLIP-informed initial data balance, performance-based subsequent acquisition, class-aware pseudo-labeling, and balanced supervised contrastive learning to address edge errors and severe class imbalance. Empirical results on Deepglobe, Potsdam, and Vaihingen show substantial improvements over state-of-the-art AL methods and across multiple segmentation backbones, with ablations validating each component’s contribution. The work also establishes a fair, strong benchmark for future AL research in aerial imagery and highlights practical gains in labeling efficiency and segmentation accuracy.

Abstract

Semantic segmentation requires pixel-level annotation, which is time-consuming. Active Learning (AL) is a promising method for reducing data annotation costs. Due to the gap between aerial and natural images, the previous AL methods are not ideal, mainly caused by unreasonable labeling units and the neglect of class imbalance. Previous labeling units are based on images or regions, which does not consider the characteristics of segmentation tasks and aerial images, i.e., the segmentation network often makes mistakes in the edge region, and the edge of aerial images is often interlaced and irregular. Therefore, an edge-guided labeling unit is proposed and supplemented as the new unit. On the other hand, the class imbalance is severe, manifested in two aspects: the aerial image is seriously imbalanced, and the AL strategy does not fully consider the class balance. Both seriously affect the performance of AL in aerial images. We comprehensively ensure class balance from all steps that may occur imbalance, including initial labeled data, subsequent labeled data, and pseudo-labels. Through the two improvements, our method achieves more than 11.2\% gains compared to state-of-the-art methods on three benchmark datasets, Deepglobe, Potsdam, and Vaihingen, and more than 18.6\% gains compared to the baseline. Sufficient ablation studies show that every module is indispensable. Furthermore, we establish a fair and strong benchmark for future research on AL for aerial image segmentation.

Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images

TL;DR

The paper tackles the high annotation burden in aerial image semantic segmentation by introducing edge-guided labeling units and a fully class-balanced active learning framework. It combines an edge-focused labeling strategy, CLIP-informed initial data balance, performance-based subsequent acquisition, class-aware pseudo-labeling, and balanced supervised contrastive learning to address edge errors and severe class imbalance. Empirical results on Deepglobe, Potsdam, and Vaihingen show substantial improvements over state-of-the-art AL methods and across multiple segmentation backbones, with ablations validating each component’s contribution. The work also establishes a fair, strong benchmark for future AL research in aerial imagery and highlights practical gains in labeling efficiency and segmentation accuracy.

Abstract

Semantic segmentation requires pixel-level annotation, which is time-consuming. Active Learning (AL) is a promising method for reducing data annotation costs. Due to the gap between aerial and natural images, the previous AL methods are not ideal, mainly caused by unreasonable labeling units and the neglect of class imbalance. Previous labeling units are based on images or regions, which does not consider the characteristics of segmentation tasks and aerial images, i.e., the segmentation network often makes mistakes in the edge region, and the edge of aerial images is often interlaced and irregular. Therefore, an edge-guided labeling unit is proposed and supplemented as the new unit. On the other hand, the class imbalance is severe, manifested in two aspects: the aerial image is seriously imbalanced, and the AL strategy does not fully consider the class balance. Both seriously affect the performance of AL in aerial images. We comprehensively ensure class balance from all steps that may occur imbalance, including initial labeled data, subsequent labeled data, and pseudo-labels. Through the two improvements, our method achieves more than 11.2\% gains compared to state-of-the-art methods on three benchmark datasets, Deepglobe, Potsdam, and Vaihingen, and more than 18.6\% gains compared to the baseline. Sufficient ablation studies show that every module is indispensable. Furthermore, we establish a fair and strong benchmark for future research on AL for aerial image segmentation.
Paper Structure (22 sections, 10 equations, 10 figures, 13 tables)

This paper contains 22 sections, 10 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: The image, label, wrong map, and score map are from left to right. The error regions generally surround the edges, and the scores of the network output in the edge regions are also low (the uncertainty is large).
  • Figure 2: The results of different active learning methods on Deepglobe dataset deepglobe with the labeled data accounting for 20% of the total, and the segmentation network is Deeplabv3 deeplabv3. In all three approaches, there are widespread performance imbalances between classes.
  • Figure 3: An overview of the overall process of active learning. First, the original data is divided into different labeling units, including rectangles and edge regions, and the process is shown in box A of the figure. Then, the data to be labeled are selected through initial data acquisition, as shown in box C. These labeled data are used for the initial training of segmentation networks. The data to be labeled next is selected according to the initially trained network's output, which is also shown in box C. For unselected data with high output confidence, pseudo-labels are generated from them to participate in network training, as shown in box B. Box D represents the balanced contrastive learning in the feature extraction part of the segmentation network. B, C, and D are all measures of class balance. The blue lines represent network training, and the black lines represent data acquisition.
  • Figure 4: The process of obtaining an edge-guided labeling unit. First, use the off-the-shelf edge detection algorithm to obtain the edge, and then use dilatation to obtain the edge regions. The obtained edge regions are the labeling units.
  • Figure 5: Overview of the initial data acquisition procedure. The whole process is to put the regions of images into the CLIP to obtain the classes of the regions and then select the balanced samples according to the class of outputs.
  • ...and 5 more figures