Table of Contents
Fetching ...

MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Shenwang Jiang, Jianan Li, Ying Wang, Wenxuan Wu, Jizhou Zhang, Bo Huang, Tingfa Xu

TL;DR

MetaSeg tackles omni-supervised semantic segmentation by actively identifying and down-weighting noisy pseudo-label regions. It introduces a Content-Aware Meta-Net (CAM-Net) that leverages multi-level feature inconsistency, class embeddings, and prototype guidance to generate pixel-wise weights for a SegNet, coupled with a decoupled alternating optimization to accelerate meta-learning. The method achieves competitive or superior performance across VOC2012, Cityscapes, ISIC, Bijie Landslide, and OCHuman, approaching fully supervised results and demonstrating strong cross-domain robustness. This work offers a practical, noise-resilient solution for leveraging weak annotations in large-scale semantic segmentation pipelines.

Abstract

Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Previous works often rely on massive hand-crafted losses and carefully-tuned hyper-parameters to resist noise, suffering poor generalization capability and high model complexity. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively, so as to simply ignore them during model optimization. With this in mind, this work presents a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart. Specifically, CAM-Net learns to generate pixel-wise weights to suppress noisy regions with incorrect pseudo labels while highlighting clean ones by exploiting hybrid strengthened features from image content, providing straightforward and reliable guidance for optimizing the segmentation model. Moreover, to break the barrier of time-consuming training when applying meta learning to common large segmentation models, we further present a new decoupled training strategy that optimizes different model layers in a divide-and-conquer manner. Extensive experiments on object, medical, remote sensing and human segmentation shows that our method achieves superior performance, approaching that of fully supervised settings, which paves a new promising way for omni-supervised semantic segmentation.

MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

TL;DR

MetaSeg tackles omni-supervised semantic segmentation by actively identifying and down-weighting noisy pseudo-label regions. It introduces a Content-Aware Meta-Net (CAM-Net) that leverages multi-level feature inconsistency, class embeddings, and prototype guidance to generate pixel-wise weights for a SegNet, coupled with a decoupled alternating optimization to accelerate meta-learning. The method achieves competitive or superior performance across VOC2012, Cityscapes, ISIC, Bijie Landslide, and OCHuman, approaching fully supervised results and demonstrating strong cross-domain robustness. This work offers a practical, noise-resilient solution for leveraging weak annotations in large-scale semantic segmentation pipelines.

Abstract

Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Previous works often rely on massive hand-crafted losses and carefully-tuned hyper-parameters to resist noise, suffering poor generalization capability and high model complexity. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively, so as to simply ignore them during model optimization. With this in mind, this work presents a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart. Specifically, CAM-Net learns to generate pixel-wise weights to suppress noisy regions with incorrect pseudo labels while highlighting clean ones by exploiting hybrid strengthened features from image content, providing straightforward and reliable guidance for optimizing the segmentation model. Moreover, to break the barrier of time-consuming training when applying meta learning to common large segmentation models, we further present a new decoupled training strategy that optimizes different model layers in a divide-and-conquer manner. Extensive experiments on object, medical, remote sensing and human segmentation shows that our method achieves superior performance, approaching that of fully supervised settings, which paves a new promising way for omni-supervised semantic segmentation.
Paper Structure (15 sections, 13 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 13 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Per-class noise rate of pseudo labels generated by GrabCut on PASCAL VOC. The black dotted line represents the average noise rate on all the categories.
  • Figure 2: Overview of MetaSeg. The CAM-Net guides the training of SegNet by generating content-dependent pixel-wise weights for the loss function to ignore label-noisy regions (masked by red-dotted boxes).
  • Figure 3: Overall framework of MetaSeg. It comprises a content-aware meta-Net (CAM-Net) and a segmentation network (SegNet). CAM-Net takes both the intermediate feature of SegNet and pseudo label as input and generates content-dependent pixel-wise weight by judging the inconsistency between multi-level image features and by exploiting embedded semantic label information. The generated weights suppress noisy regions and highlight clean ones on the pixel-wise re-weighted loss for guiding the training of SegNet.
  • Figure 4: Illustration of decoupled alternating training strategy, which comprises three steps: Virtual-Train, Meta-Train, and Actual-Train. The SegNet is optimized on $\mathcal{D}_{\rm B}$ comprising a large number of coarsely-labeled data while the CAM-Net is trained on $\mathcal{D}_{\rm C}$ comprising a small number of finely-labeled data. $\rightarrow$ and $\dashrightarrow$ denotes forward and backward propagation respectively.
  • Figure 5: Visualization of ground truths, pseudo labels, and generated weights by CAM-Net at training stage (left), and qualitative results at inference stage (right) on PASCAL VOC 2012. Weight maps are overlaid on input image for clearness of spatial correspondence, with red color representing high values while blue color representing low values. Red dotted boxes highlight the regions with noisy labels. Yellow dotted boxes mark where our method is superior to baseline.
  • ...and 3 more figures