Table of Contents
Fetching ...

Knowledge Consultation for Semi-Supervised Semantic Segmentation

Thuan Than, Nhat-Anh Nguyen-Dang, Dung Nguyen, Salwa K. Al Khatib, Ahmed Elhagry, Hai Phan, Yihui He, Zhiqiang Shen, Marios Savvides, Dang Huynh

TL;DR

SegKC tackles the annotation bottleneck in semantic segmentation by leveraging unlabeled data in a semi-supervised setting. It introduces Knowledge Consultation, a senior-junior co-training framework that applies Cross Pseudo Supervision to heterogeneous backbones, enabling bi-directional knowledge exchange at feature and prediction levels. Only the junior model is used at inference to keep the model compact and efficient. On Pascal VOC 2012 and Cityscapes, SegKC achieves state-of-the-art results across multiple labeled-data partitions, with ablations confirming the benefits of heterogeneous backbones and the knowledge-consultation mechanism.

Abstract

Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further enhance segmentation performance. The proposed SegKC achieves significant improvements on Pascal and Cityscapes benchmarks, with mIoU scores of 87.1%, 89.2%, and 89.8% on Pascal VOC with the 1/4, 1/2, and full split partition, respectively, while maintaining a compact model architecture.

Knowledge Consultation for Semi-Supervised Semantic Segmentation

TL;DR

SegKC tackles the annotation bottleneck in semantic segmentation by leveraging unlabeled data in a semi-supervised setting. It introduces Knowledge Consultation, a senior-junior co-training framework that applies Cross Pseudo Supervision to heterogeneous backbones, enabling bi-directional knowledge exchange at feature and prediction levels. Only the junior model is used at inference to keep the model compact and efficient. On Pascal VOC 2012 and Cityscapes, SegKC achieves state-of-the-art results across multiple labeled-data partitions, with ablations confirming the benefits of heterogeneous backbones and the knowledge-consultation mechanism.

Abstract

Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further enhance segmentation performance. The proposed SegKC achieves significant improvements on Pascal and Cityscapes benchmarks, with mIoU scores of 87.1%, 89.2%, and 89.8% on Pascal VOC with the 1/4, 1/2, and full split partition, respectively, while maintaining a compact model architecture.

Paper Structure

This paper contains 15 sections, 10 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Efficiency comparison of our method, SegKC, against state-of-the-art approaches on Pascal VOC original dataset. We evaluate SegKC against leading methods using mean Intersection over Union (mIoU) and overall model size. With heterogeneous backbones in a Knowledge Consultation framework, SegKC offers a more dynamic knowledge transfer. While co-training both models increases costs, only the junior model is used for inference to maintain efficiency and practicality. This figure highlights SegKC’s strong performance on Pascal VOC original (1/2 partition), achieving 89.2% mIoU with 24.8M parameters and setting a new benchmark for the dataset. Larger models are shown in darker colors to emphasize SegKC’s compact design and effectiveness. A detailed comparison is provided in Table \ref{['tab:pascal_origin']}.
  • Figure 2: Overview of the SegKC method for Semi-Supervised Semantic Segmentation using labeled and unlabeled images. The method incorporates two distinct backbones, termed the senior and junior models. Knowledge transfers from the junior to the senior model to enrich the senior’s representations. The senior model distills knowledge back to the junior model and refines its understanding before generating final predictions. By using heterogeneous backbones and employing the junior model exclusively for inference, this design enhances Cross Pseudo Supervision while maintaining a compact and efficient architecture.
  • Figure 3: Qualitative comparison on the original Pascal VOC 2012 dataset between AllSpark, SegKC (ours), and ground truth (GT) using the 1/2 partition.
  • Figure 4: Qualitative comparison on the Cityscapes dataset between BeyondPixels, SegKC (ours), and ground truth (GT) using the 1/2 partition.