Table of Contents
Fetching ...

Active Label Correction for Semantic Segmentation with Foundation Models

Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok

TL;DR

The paper tackles the expensive process of obtaining pixel-level annotations for semantic segmentation by introducing Active Label Correction (ALC), which uses foundation-model priors to generate pseudo labels and employs a correction query that only requests a true label if the pseudo label is wrong. ALC combines a diversified, superpixel-aware pixel pool with a look-ahead acquisition function (SIM) to select informative and diverse corrections and expand labels across superpixels, backed by a cost-analysis showing savings over direct classification queries. Empirically, ALC matches or outperforms state-of-the-art active learning methods on Cityscapes, PASCAL, and Kvasir-SEG, while enabling the creation of high-quality datasets like PASCAL+ by correcting millions of pixels. The practical impact is substantial: reduced annotation effort, improved dataset reliability, and demonstrated improvements in segmentation performance with corrected data, including substantial gains for several target classes.

Abstract

Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels, which in turn is more annotator-friendly than the standard one inquiring to classify a pixel directly according to our theoretical analysis and user study. Specifically, leveraging foundation models providing useful zero-shot predictions on pseudo labels and superpixels, our method comprises two key techniques: (i) an annotator-friendly design of correction query with the pseudo labels, and (ii) an acquisition function looking ahead label expansions based on the superpixels. Experimental results on PASCAL, Cityscapes, and Kvasir-SEG datasets demonstrate the effectiveness of our ALC framework, outperforming prior methods for active semantic segmentation and label correction. Notably, utilizing our method, we obtained a revised dataset of PASCAL by rectifying errors in 2.6 million pixels in PASCAL dataset.

Active Label Correction for Semantic Segmentation with Foundation Models

TL;DR

The paper tackles the expensive process of obtaining pixel-level annotations for semantic segmentation by introducing Active Label Correction (ALC), which uses foundation-model priors to generate pseudo labels and employs a correction query that only requests a true label if the pseudo label is wrong. ALC combines a diversified, superpixel-aware pixel pool with a look-ahead acquisition function (SIM) to select informative and diverse corrections and expand labels across superpixels, backed by a cost-analysis showing savings over direct classification queries. Empirically, ALC matches or outperforms state-of-the-art active learning methods on Cityscapes, PASCAL, and Kvasir-SEG, while enabling the creation of high-quality datasets like PASCAL+ by correcting millions of pixels. The practical impact is substantial: reduced annotation effort, improved dataset reliability, and demonstrated improvements in segmentation performance with corrected data, including substantial gains for several target classes.

Abstract

Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels, which in turn is more annotator-friendly than the standard one inquiring to classify a pixel directly according to our theoretical analysis and user study. Specifically, leveraging foundation models providing useful zero-shot predictions on pseudo labels and superpixels, our method comprises two key techniques: (i) an annotator-friendly design of correction query with the pseudo labels, and (ii) an acquisition function looking ahead label expansions based on the superpixels. Experimental results on PASCAL, Cityscapes, and Kvasir-SEG datasets demonstrate the effectiveness of our ALC framework, outperforming prior methods for active semantic segmentation and label correction. Notably, utilizing our method, we obtained a revised dataset of PASCAL by rectifying errors in 2.6 million pixels in PASCAL dataset.
Paper Structure (30 sections, 1 theorem, 12 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 12 equations, 13 figures, 10 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume the information-theoretic annotation cost hu2020one of selecting one out of $L$ possible options to be $\log_2 L$. Let $L \ge 2$ be the number of classes, and $p$ be the probability that the pseudo label is correct. Then, $C_{\textnormal{cls}}(L) = \log_2 L$ and $C_{\textnormal{cor}} (L, p)=

Figures (13)

  • Figure 1: Examples of noisy and corrected labels in PASCAL. (a, b) Initial pseudo labels are generated by applying Grounded-SAM (G-SAM) to unlabeled images. As depicted by the yellow boxes, noisy pseudo labels result in a decline in performance, as shown in Table \ref{['tab:grounded-threshold']}. (c) PASCAL also contains noisy labels in cyan boxes. (d) By employing the superpixels from G-SAM, we construct a corrected version of PASCAL, called PASCAL+. For instance, in the first row, we correct the object labeled as person to tvmonitor, and in the second row, the object labeled as background to tvmonitor. Here, the colors black, blue, red, green, and pink represent the background, tvmonitor, chair, sofa, and person classes, respectively.
  • Figure 2: An example of correction query. Correction query presents an instruction requesting a label for a representative pixel (green star), an image displaying an object within a bounding box (green rectangle), and possible class options.
  • Figure 3: Effect of active label correction.ALC shows comparable results on both datasets with much fewer clicks. ALC (normalized) reflects the reduced budget of correction queries with normalization by Theorem \ref{['the:queries']}.
  • Figure 4: Precision and recall comparisons. Our SIM acquisition shows a high recall, indicating it corrects many noisy pixels with limited budgets.
  • Figure 5: Kvasir-SEG experiments. The proposed SIM acquisition operate robustly on medical dataset across different budgets.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof