Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
Xinyu Yang, Hossein Rahmani, Sue Black, Bryan M. Williams
TL;DR
This paper tackles the challenge of weakly supervised semantic segmentation where CAMs used for pseudo-labels are unstable and often rely on offline refinements. It introduces Co-training with Swapping Assignments (CoSA), a fully end-to-end dual-stream framework with an online network and an assignment network that swap CAM-based and segmentation pseudo-labels to mutually supervise each other. Key contributions include guiding CAMs with segmentation pseudo-labels (SPL), reliability-aware weighting of pseudo-labels (RAW), dynamic thresholding to adapt to learning, and a contrastive separation loss to mitigate CAM coexistence. The approach achieves state-of-the-art results on VOC and COCO, reduces or eliminates the need for post-hoc refinements like CRFs, and demonstrates strong training efficiency, signaling a practical, single-stage alternative to multi-stage WSSS pipelines.
Abstract
Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels. Due to incomplete or excessive class activation, existing studies often resort to offline CAM refinement, introducing additional stages or proposing offline modules. This can cause optimization difficulties for single-stage methods and limit generalizability. In this study, we aim to reduce the observed CAM inconsistency and error to mitigate reliance on refinement processes. We propose an end-to-end WSSS model incorporating guided CAMs, wherein our segmentation model is trained while concurrently optimizing CAMs online. Our method, Co-training with Swapping Assignments (CoSA), leverages a dual-stream framework, where one sub-network learns from the swapped assignments generated by the other. We introduce three techniques: i) soft perplexity-based regularization to penalize uncertain regions; ii) a threshold-searching approach to dynamically revise the confidence threshold; and iii) contrastive separation to address the coexistence problem. CoSA demonstrates exceptional performance, achieving mIoU of 76.2\% and 51.0\% on VOC and COCO validation datasets, respectively, surpassing existing baselines by a substantial margin. Notably, CoSA is the first single-stage approach to outperform all existing multi-stage methods including those with additional supervision. Code is avilable at \url{https://github.com/youshyee/CoSA}.
