Table of Contents
Fetching ...

Localized Region Guidance for Class Activation Mapping in WSSS

Ali Torabi, Sanjog Gaihre, MD Mahbubur Rahman, Yaqoob Majeed

TL;DR

IG-CAM presents a principled approach to weakly supervised semantic segmentation by embedding influence functions into the CAM-based localization pipeline. By combining instance-level guidance, multi-scale CAMs, and influence-weighted objectives, the method produces boundary-aware localization that covers complete objects. The approach delivers state-of-the-art VOC 2012 performance (82.3% mIoU pre-CRF, 86.6% post-CRF) and strong COCO generalization (51.4% val), with ablations showing substantial gains from influence estimation. This influence-guided framework advances WSSS by providing a theoretical and practical mechanism to prioritize informative samples and regions, improving robustness and boundary quality while maintaining reasonable computational overhead.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) addresses the challenge of training segmentation models using only image-level annotations. Existing WSSS methods struggle with precise object boundary localization and focus only on the most discriminative regions. To address these challenges, we propose IG-CAM (Instance-Guided Class Activation Mapping), a novel approach that leverages instance-level cues and influence functions to generate high-quality, boundary-aware localization maps. Our method introduces three key innovations: (1) Instance-Guided Refinement using object proposals to guide CAM generation, ensuring complete object coverage; (2) Influence Function Integration that captures the relationship between training samples and model predictions; and (3) Multi-Scale Boundary Enhancement with progressive refinement strategies. IG-CAM achieves state-of-the-art performance on PASCAL VOC 2012 with 82.3% mIoU before post-processing, improving to 86.6% after CRF refinement, significantly outperforming previous WSSS methods. Extensive ablation studies validate each component's contribution, establishing IG-CAM as a new benchmark for weakly supervised semantic segmentation.

Localized Region Guidance for Class Activation Mapping in WSSS

TL;DR

IG-CAM presents a principled approach to weakly supervised semantic segmentation by embedding influence functions into the CAM-based localization pipeline. By combining instance-level guidance, multi-scale CAMs, and influence-weighted objectives, the method produces boundary-aware localization that covers complete objects. The approach delivers state-of-the-art VOC 2012 performance (82.3% mIoU pre-CRF, 86.6% post-CRF) and strong COCO generalization (51.4% val), with ablations showing substantial gains from influence estimation. This influence-guided framework advances WSSS by providing a theoretical and practical mechanism to prioritize informative samples and regions, improving robustness and boundary quality while maintaining reasonable computational overhead.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) addresses the challenge of training segmentation models using only image-level annotations. Existing WSSS methods struggle with precise object boundary localization and focus only on the most discriminative regions. To address these challenges, we propose IG-CAM (Instance-Guided Class Activation Mapping), a novel approach that leverages instance-level cues and influence functions to generate high-quality, boundary-aware localization maps. Our method introduces three key innovations: (1) Instance-Guided Refinement using object proposals to guide CAM generation, ensuring complete object coverage; (2) Influence Function Integration that captures the relationship between training samples and model predictions; and (3) Multi-Scale Boundary Enhancement with progressive refinement strategies. IG-CAM achieves state-of-the-art performance on PASCAL VOC 2012 with 82.3% mIoU before post-processing, improving to 86.6% after CRF refinement, significantly outperforming previous WSSS methods. Extensive ablation studies validate each component's contribution, establishing IG-CAM as a new benchmark for weakly supervised semantic segmentation.

Paper Structure

This paper contains 34 sections, 13 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: IG-CAM pipeline overview illustrating the complete workflow from input image to segmentation mask.
  • Figure 2: The IG-CAM pipeline architecture demonstrating the complete workflow from input image to final segmentation mask. The framework consists of five tightly integrated components unified by influence function integration: (1) Multi-scale feature extraction with ResNet-101 + FPN backbone, enhanced by influence-guided attention to emphasize the most informative regions across four spatial scales (1/4, 1/8, 1/16, 1/32). (2) Instance-guided CAM generation using object proposals from Selective Search, with influence-based refinement that weights each proposal by its spatial influence values. (3) Comprehensive influence-based importance estimation through Hessian-vector products, computing spatial influence maps that identify the most critical regions for model predictions. (4) Influence-weighted learning objectives including multi-scale consistency loss, boundary-aware loss, and influence-guided classification loss that prioritize learning from the most impactful samples and regions. (5) Progressive multi-stage training strategy with influence map adaptation, followed by DenseCRF post-processing to produce pixel-accurate segmentation masks.
  • Figure 3: CAM and pseudo-label visualization results on PASCAL VOC 2012.