Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label
Byeongkeun Kang, Sinhae Cha, Yeejin Lee
TL;DR
Problem: localize objects under weak supervision with only image-level labels. Approach: a three-component WSOL network with adversarial erasing losses on feature maps and foreground masks, plus a pixel-level pseudo-label loss guiding background suppression and foreground activation; the total objective blends seven terms: $L = L_{cls} + gamma1 L_{cls_fg} + gamma2 L_{ae} + gamma3 L_{ae_fg} + gamma4 L_{pseudo} + gamma5 L_{bas} + gamma6 L_{ac}$. Key contributions: implicit full-object localization without extra inference-time branches, two complementary erasing losses, and pixel-level pseudo-label supervision that improve localization accuracy. Findings: the method achieves state-of-the-art localization across ILSVRC-2012, CUB-200-2011, and PASCAL VOC 2012 for two backbones, with ablations confirming the contribution of each loss and the advantage of higher-resolution shared features. Significance: provides a practical, end-to-end WSOL framework that better suppresses backgrounds and covers the full object, enabling robust localization in weakly-supervised settings.
Abstract
Weakly-supervised learning approaches have gained significant attention due to their ability to reduce the effort required for human annotations in training neural networks. This paper investigates a framework for weakly-supervised object localization, which aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels. The proposed framework consists of a shared feature extractor, a classifier, and a localizer. The localizer predicts pixel-level class probabilities, while the classifier predicts the object class at the image level. Since image-level class labels are insufficient for training the localizer, weakly-supervised object localization methods often encounter challenges in accurately localizing the entire object region. To address this issue, the proposed method incorporates adversarial erasing and pseudo labels to improve localization accuracy. Specifically, novel losses are designed to utilize adversarially erased foreground features and adversarially erased feature maps, reducing dependence on the most discriminative region. Additionally, the proposed method employs pseudo labels to suppress activation values in the background while increasing them in the foreground. The proposed method is applied to two backbone networks (MobileNetV1 and InceptionV3) and is evaluated on three publicly available datasets (ILSVRC-2012, CUB-200-2011, and PASCAL VOC 2012). The experimental results demonstrate that the proposed method outperforms previous state-of-the-art methods across all evaluated metrics.
