Active Label Correction for Semantic Segmentation with Foundation Models
Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok
TL;DR
The paper tackles the expensive process of obtaining pixel-level annotations for semantic segmentation by introducing Active Label Correction (ALC), which uses foundation-model priors to generate pseudo labels and employs a correction query that only requests a true label if the pseudo label is wrong. ALC combines a diversified, superpixel-aware pixel pool with a look-ahead acquisition function (SIM) to select informative and diverse corrections and expand labels across superpixels, backed by a cost-analysis showing savings over direct classification queries. Empirically, ALC matches or outperforms state-of-the-art active learning methods on Cityscapes, PASCAL, and Kvasir-SEG, while enabling the creation of high-quality datasets like PASCAL+ by correcting millions of pixels. The practical impact is substantial: reduced annotation effort, improved dataset reliability, and demonstrated improvements in segmentation performance with corrected data, including substantial gains for several target classes.
Abstract
Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels, which in turn is more annotator-friendly than the standard one inquiring to classify a pixel directly according to our theoretical analysis and user study. Specifically, leveraging foundation models providing useful zero-shot predictions on pseudo labels and superpixels, our method comprises two key techniques: (i) an annotator-friendly design of correction query with the pseudo labels, and (ii) an acquisition function looking ahead label expansions based on the superpixels. Experimental results on PASCAL, Cityscapes, and Kvasir-SEG datasets demonstrate the effectiveness of our ALC framework, outperforming prior methods for active semantic segmentation and label correction. Notably, utilizing our method, we obtained a revised dataset of PASCAL by rectifying errors in 2.6 million pixels in PASCAL dataset.
