Table of Contents
Fetching ...

Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation

Fei Wu, Pablo Marquez-Neila, Hedyeh Rafi-Tarii, Raphael Sznitman

TL;DR

This work tackles the high annotation cost of multi-class semantic segmentation and the importance of boundary pixels by introducing OREAL, a patch-based active learning method that combines maximum aggregation of pixel uncertainties with a novel one-vs-rest entropy score for implicit class balancing. It operates on superpixel patches with dominant labeling to minimize annotation effort, and uses a class-balancing-aware annotation strategy to ensure tail classes are well represented. Across four diverse datasets and multiple architectures, OREAL achieves competitive and often superior AuALC and mIoU, with notable gains when using the max aggregation strategy that emphasizes boundary regions. The approach demonstrates that prioritizing context around objects and class-aware uncertainty can substantially reduce labeling cost while improving segmentation performance in real-world settings.

Abstract

Multi-class semantic segmentation remains a cornerstone challenge in computer vision. Yet, dataset creation remains excessively demanding in time and effort, especially for specialized domains. Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically. However, existing patch-based AL methods often overlook boundary pixels critical information, essential for accurate segmentation. We present OREAL, a novel patch-based AL method designed for multi-class semantic segmentation. OREAL enhances boundary detection by employing maximum aggregation of pixel-wise uncertainty scores. Additionally, we introduce one-vs-rest entropy, a novel uncertainty score function that computes class-wise uncertainties while achieving implicit class balancing during dataset creation. Comprehensive experiments across diverse datasets and model architectures validate our hypothesis.

Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation

TL;DR

This work tackles the high annotation cost of multi-class semantic segmentation and the importance of boundary pixels by introducing OREAL, a patch-based active learning method that combines maximum aggregation of pixel uncertainties with a novel one-vs-rest entropy score for implicit class balancing. It operates on superpixel patches with dominant labeling to minimize annotation effort, and uses a class-balancing-aware annotation strategy to ensure tail classes are well represented. Across four diverse datasets and multiple architectures, OREAL achieves competitive and often superior AuALC and mIoU, with notable gains when using the max aggregation strategy that emphasizes boundary regions. The approach demonstrates that prioritizing context around objects and class-aware uncertainty can substantially reduce labeling cost while improving segmentation performance in real-world settings.

Abstract

Multi-class semantic segmentation remains a cornerstone challenge in computer vision. Yet, dataset creation remains excessively demanding in time and effort, especially for specialized domains. Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically. However, existing patch-based AL methods often overlook boundary pixels critical information, essential for accurate segmentation. We present OREAL, a novel patch-based AL method designed for multi-class semantic segmentation. OREAL enhances boundary detection by employing maximum aggregation of pixel-wise uncertainty scores. Additionally, we introduce one-vs-rest entropy, a novel uncertainty score function that computes class-wise uncertainties while achieving implicit class balancing during dataset creation. Comprehensive experiments across diverse datasets and model architectures validate our hypothesis.

Paper Structure

This paper contains 17 sections, 3 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Context sampling with patch-based active learning for semantic segmentation. Mean aggregation (left) ignores patches at the boundaries of the object of interest, while maximum aggregation (right) encourages the sampling of these boundaries. This simple modification of the aggregation function resulted in an improvement of 3 percentage points in mIoU.
  • Figure 2: Comparison between selected superpixel regions (yellow borders) when the score of the superpixel is defined as the Mean (left) or Max (right) aggregation of its pixel score. The image is from the EndoVis endovis dataset which displays a surgery tool.
  • Figure 3: First row shows images from Cityscapes cityscapes and Pascal VOC pascalVOC, while second row shows images from MONARCH COWAL and EndoVis endovis. Images are overlayed by their segmentation mask with a color per class.
  • Figure 4: Comparison of different sampling strategies and their ablated version using the Max aggregation (Sec. \ref{['sec: max agg']}). Values are averaged over 3 training-validation splits for Cityscapes cityscapes, Pascal VOC pascalVOC, and 10 training-validation splits for EndoVis endovis, MONARCH COWAL. Error bars indicate one standard deviation. Results using ResNet101 and ViT backbones can be found in the Appendix.
  • Figure 5: Selected superpixel regions of different sampling strategies. On average, we observe that when using the max aggregation, all sampling methods tend to select more regions around the boundary of objects. Additional plots can be found in the appendix.