Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

Hui Xiao; Yuting Hong; Li Dong; Diqun Yan; Jiayan Zhuang; Junjie Xiong; Dongtai Liang; Chengbin Peng

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

TL;DR

This work tackles semi-supervised semantic segmentation where unlabeled data and noisy pseudo-labels hinder learning. It introduces Multi-Level Label Correction (MLLC), a dual-graph framework comprising a Semantic-Level Graph (SLG) and a Class-Level Graph (CLG) that distill proximate patterns from feature and label spaces through self-distillation. The method employs a SLG contrastive loss with intra- and inter-image terms and a dynamic-weighted CLG cross-entropy loss, iteratively updating graphs over $K$ rounds with a dynamic threshold $\eta^{(t,c)}$ to mitigate noisy pseudo-labels. Empirical results on Cityscapes and PASCAL VOC 2012 show state-of-the-art improvements across partition protocols and backbones, demonstrating enhanced pseudo-label quality and discriminative feature learning with practical impact for data-efficient segmentation.

Abstract

Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (MLLC), which aims to use graph neural networks to capture structural relationships in Semantic-Level Graphs (SLGs) and Class-Level Graphs (CLGs) to rectify erroneous pseudo-labels. Specifically, SLGs represent semantic affinities between pairs of pixel features, and CLGs describe classification consistencies between pairs of pixel labels. With the support of proximate pattern information from graphs, MLLC can rectify incorrectly predicted pseudo-labels and can facilitate discriminative feature representations. We design an end-to-end network to train and perform this effective label corrections mechanism. Experiments demonstrate that MLLC can significantly improve supervised baselines and outperforms state-of-the-art approaches in different scenarios on Cityscapes and PASCAL VOC 2012 datasets. Specifically, MLLC improves the supervised baseline by at least 5% and 2% with DeepLabV2 and DeepLabV3+ respectively under different partition protocols.

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

TL;DR

rounds with a dynamic threshold

to mitigate noisy pseudo-labels. Empirical results on Cityscapes and PASCAL VOC 2012 show state-of-the-art improvements across partition protocols and backbones, demonstrating enhanced pseudo-label quality and discriminative feature learning with practical impact for data-efficient segmentation.

Abstract

Paper Structure (13 sections, 17 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 13 sections, 17 equations, 8 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Our Approach
Preliminary
Semantic-Level Graph and Class-Level Graph Construction
Self-Distillation Between Semantic-Level Graph and Class-Level Graph
Training Process
Experimental Results
Experiment Setting
Comparison with State-of-the-Art Methods
Ablation Study
Conclusion
Acknowledgment

Figures (8)

Figure 1: A brief illustration. (a) Ground-truth labels for pixels represented by GREEN and PINK, as a mask on input image. (b) Pseudo-labels without refinement. (c) Pseudo-labels refined by MLLC framework. (d) The left and right subplots represent prediction distributions with and without MLLC, respectively, and by correcting noisy pseudo-labels, the framework can help classifiers to find better decision boundaries. (e) The updating process of MLLC, in which CLG aggregates semantic knowledge from SLG and SLG aggregates classification information from CLG during iterative refinements.
Figure 2: Overview of the proposed framework. The network model consists of a CNN-based encoder and a decoder with two heads. One is a classification head represented by $Cls$ and a feature embedding head represented by $Emb$. For an unlabeled image, we first perform a strong data augmentation and a weak data augmentation represented by $S. Aug$. and $W. Aug.$ respectively, and then feed these two differently augmented data into student and teacher networks, respectively. Immediately following the network is a two-level graph framework designed to generate refined segmentation predictions, with $O_C$ and $O_S$ denoting the output size of the embedding head and the classification head respectively. These more reliable predictions and features are then used as supervised knowledge for the student network.
Figure 3: Sensitivity analysis for hyper-parameters $\lambda$.
Figure 4: Ablation study on multi-level graph. Quality comparison on pseudo-labels generated by MLLC and self-training (ST).
Figure 5: Visualization of features, we use t-SNE to map features extracted from input data to a 2D space. We sample 256 points per class for the plot. The results show that MLLC achieves better clustering results.
...and 3 more figures

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

TL;DR

Abstract

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)