Localized Region Guidance for Class Activation Mapping in WSSS
Ali Torabi, Sanjog Gaihre, MD Mahbubur Rahman, Yaqoob Majeed
TL;DR
IG-CAM presents a principled approach to weakly supervised semantic segmentation by embedding influence functions into the CAM-based localization pipeline. By combining instance-level guidance, multi-scale CAMs, and influence-weighted objectives, the method produces boundary-aware localization that covers complete objects. The approach delivers state-of-the-art VOC 2012 performance (82.3% mIoU pre-CRF, 86.6% post-CRF) and strong COCO generalization (51.4% val), with ablations showing substantial gains from influence estimation. This influence-guided framework advances WSSS by providing a theoretical and practical mechanism to prioritize informative samples and regions, improving robustness and boundary quality while maintaining reasonable computational overhead.
Abstract
Weakly Supervised Semantic Segmentation (WSSS) addresses the challenge of training segmentation models using only image-level annotations. Existing WSSS methods struggle with precise object boundary localization and focus only on the most discriminative regions. To address these challenges, we propose IG-CAM (Instance-Guided Class Activation Mapping), a novel approach that leverages instance-level cues and influence functions to generate high-quality, boundary-aware localization maps. Our method introduces three key innovations: (1) Instance-Guided Refinement using object proposals to guide CAM generation, ensuring complete object coverage; (2) Influence Function Integration that captures the relationship between training samples and model predictions; and (3) Multi-Scale Boundary Enhancement with progressive refinement strategies. IG-CAM achieves state-of-the-art performance on PASCAL VOC 2012 with 82.3% mIoU before post-processing, improving to 86.6% after CRF refinement, significantly outperforming previous WSSS methods. Extensive ablation studies validate each component's contribution, establishing IG-CAM as a new benchmark for weakly supervised semantic segmentation.
