Table of Contents
Fetching ...

IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

Kebin Wu, Wenbin Li, Xiaofei Xiao

TL;DR

IPixMatch addresses data scarcity in semi-supervised semantic segmentation by mining inter-pixel relations within soft pseudo-labels. It extends the standard teacher-student framework with an inter-pixel loss that can use either KL divergence or a correlation-based distance, aligning teacher and student predictions over high-confidence regions to exploit spatial dependencies. The method is designed as a drop-in enhancement without requiring architectural changes and shows consistent gains on Pascal VOC 2012 and Cityscapes, especially with limited labeled data. This work highlights the value of contextual inter-pixel relationships for improving generalization in low-data regimes and suggests broad applicability to other tasks with sparse annotations.

Abstract

The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge embedded within inter-pixel relations. This negligence leads to suboptimal performance and limited generalization. In this paper, we propose a novel approach IPixMatch designed to mine the neglected but valuable Inter-Pixel information for semi-supervised learning. Specifically, IPixMatch is constructed as an extension of the standard teacher-student network, incorporating additional loss terms to capture inter-pixel relations. It shines in low-data regimes by efficiently leveraging the limited labeled data and extracting maximum utility from the available unlabeled data. Furthermore, IPixMatch can be integrated seamlessly into most teacher-student frameworks without the need of model modification or adding additional components. Our straightforward IPixMatch method demonstrates consistent performance improvements across various benchmark datasets under different partitioning protocols.

IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

TL;DR

IPixMatch addresses data scarcity in semi-supervised semantic segmentation by mining inter-pixel relations within soft pseudo-labels. It extends the standard teacher-student framework with an inter-pixel loss that can use either KL divergence or a correlation-based distance, aligning teacher and student predictions over high-confidence regions to exploit spatial dependencies. The method is designed as a drop-in enhancement without requiring architectural changes and shows consistent gains on Pascal VOC 2012 and Cityscapes, especially with limited labeled data. This work highlights the value of contextual inter-pixel relationships for improving generalization in low-data regimes and suggests broad applicability to other tasks with sparse annotations.

Abstract

The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge embedded within inter-pixel relations. This negligence leads to suboptimal performance and limited generalization. In this paper, we propose a novel approach IPixMatch designed to mine the neglected but valuable Inter-Pixel information for semi-supervised learning. Specifically, IPixMatch is constructed as an extension of the standard teacher-student network, incorporating additional loss terms to capture inter-pixel relations. It shines in low-data regimes by efficiently leveraging the limited labeled data and extracting maximum utility from the available unlabeled data. Furthermore, IPixMatch can be integrated seamlessly into most teacher-student frameworks without the need of model modification or adding additional components. Our straightforward IPixMatch method demonstrates consistent performance improvements across various benchmark datasets under different partitioning protocols.
Paper Structure (12 sections, 5 equations, 2 figures, 5 tables)

This paper contains 12 sections, 5 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: IPixMatch framework to mine the inter-pixel relation, where each line denotes spatial relation consistency mapping per channel.
  • Figure 2: Qualitative comparison between IPixMatch and UniMatch* on Cityscapes: 1/8 labeled data with RN-50 as backbone. White dashed rectangles highlight the segmentation differences.