Table of Contents
Fetching ...

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim

TL;DR

DHR addresses a core weakness of weakly-supervised semantic segmentation: the disappearance of minor classes in inter-class regions during seed propagation. It introduces a three-stage propagation framework that first restores vanished seeds via Optimal Transport, then separates inter-class regions with unsupervised feature maps, and finally refines intra-class detail with weakly-supervised cues, all under recursive learning. Empirical results across five benchmarks show state-of-the-art mIoU scores (e.g., VOC 79.8%, COCO 53.9%, Context 49.0%, ADE 32.9%, Stuff 37.4%), with the VOC gap to fully supervised methods reduced by over 84%, highlighting strong practical impact. By combining USS and WSS features in a hierarchical, model-agnostic manner, DHR delivers robust seed propagation without heavy dependence on external models, making it suitable for downstream tasks and potential integration with tools like SAM for enhanced segmentation.

Abstract

Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8\%, COCO: 53.9\%, Context: 49.0\%, ADE: 32.9\%, Stuff: 37.4\%), reducing the gap with fully supervised methods by over 84\% on the VOC validation set. Code is available at https://github.com/shjo-april/DHR.

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

TL;DR

DHR addresses a core weakness of weakly-supervised semantic segmentation: the disappearance of minor classes in inter-class regions during seed propagation. It introduces a three-stage propagation framework that first restores vanished seeds via Optimal Transport, then separates inter-class regions with unsupervised feature maps, and finally refines intra-class detail with weakly-supervised cues, all under recursive learning. Empirical results across five benchmarks show state-of-the-art mIoU scores (e.g., VOC 79.8%, COCO 53.9%, Context 49.0%, ADE 32.9%, Stuff 37.4%), with the VOC gap to fully supervised methods reduced by over 84%, highlighting strong practical impact. By combining USS and WSS features in a hierarchical, model-agnostic manner, DHR delivers robust seed propagation without heavy dependence on external models, making it suitable for downstream tasks and potential integration with tools like SAM for enhanced segmentation.

Abstract

Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8\%, COCO: 53.9\%, Context: 49.0\%, ADE: 32.9\%, Stuff: 37.4\%), reducing the gap with fully supervised methods by over 84\% on the VOC validation set. Code is available at https://github.com/shjo-april/DHR.
Paper Structure (38 sections, 6 equations, 14 figures, 12 tables)

This paper contains 38 sections, 6 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Importance of WSS.(a): Our WSS approach (DHR) outperforms large-scale vision models ren2024groundedyou2023ferret with only image-level supervision and 25% of the parameters, bypassing the need for extensive human annotations (image, text, and box pairs). (b): Our DHR significantly exceeds Grounded SAM ren2024grounded, Ferret you2023ferret, and recent WSS models ru2023tokenLin_2023_CVPRzhu2023weaktrJo_2023_ICCV in standard benchmark performances everingham2010pascallin2014microsoft.
  • Figure 2: Vanishing problem of adjacent minor classes in WSS outputs. Red boxes illustrate the false prediction of minor classes in pseudo labels generated from WSS, e.g., bottle, person, and backpack. Green boxes highlight our DHR outperforming state-of-the-art baselines ru2023tokenJo_2023_ICCV in adjacent class regions.
  • Figure 3: Visualization of heatmaps with target points.(a): USS features can precisely separate between inter-class regions (e.g., animals vs. vehicles) unlike WSS. (b): Thanks to training image-level class labels, WSS features can discern specific classes (e.g., dog vs. cat) in the same inter-class region (e.g., animal).
  • Figure 4: Conceptual illustration of our hierarchical clustering.Left. USS feature correlation automatically groups inter classes. Right. Using USS features, we categorize all inter classes (e.g., vehicle and animal) and then separate these intra classes (e.g., car and bus) per each inter class (e.g., vehicle) with WSS features.
  • Figure 5: Overview of DHR. Our framework unfolds in three steps, recovering vanished classes by replacing pixels of input WSS mask with OT-based CAMs for seed initialization. We then employ a hierarchical approach to propagate restored seeds, using unsupervised feature maps for the inter-class segregation (e.g., kitchenware) and weakly-supervised features for the intra-class differentiation (e.g., bottle and cup). Finally, our balanced masks are used to train the segmentation model recursively.
  • ...and 9 more figures