Table of Contents
Fetching ...

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

TL;DR

This paper introduces a unified, high-capacity weakly supervised object detection network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision, and indicates that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.

Abstract

Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision. HUWSOD innovatively incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate resampling pyramid to replace traditional object proposals, enabling end-to-end WSOD training and inference. Additionally, we implement a holistic self-training scheme that refines detection scores and coordinates through step-wise entropy minimization and consistency-constraint regularization, ensuring consistent predictions across stochastic augmentations of the same image. Extensive experiments on PASCAL VOC and MS COCO demonstrate that HUWSOD competes with state-of-the-art WSOD methods, eliminating the need for offline proposals and additional data. The peak performance of HUWSOD approaches that of fully-supervised Faster R-CNN. Our findings also indicate that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

TL;DR

This paper introduces a unified, high-capacity weakly supervised object detection network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision, and indicates that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.

Abstract

Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision. HUWSOD innovatively incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate resampling pyramid to replace traditional object proposals, enabling end-to-end WSOD training and inference. Additionally, we implement a holistic self-training scheme that refines detection scores and coordinates through step-wise entropy minimization and consistency-constraint regularization, ensuring consistent predictions across stochastic augmentations of the same image. Extensive experiments on PASCAL VOC and MS COCO demonstrate that HUWSOD competes with state-of-the-art WSOD methods, eliminating the need for offline proposals and additional data. The peak performance of HUWSOD approaches that of fully-supervised Faster R-CNN. Our findings also indicate that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.
Paper Structure (34 sections, 10 equations, 9 figures, 9 tables)

This paper contains 34 sections, 10 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: An illustration of common WSOD (a) and the proposed HUWSOD framework (b). Our key innovations compared to existing methods include end-to-end object proposal generators and holistic self-training scheme.
  • Figure 2: The overall framework of the proposed HUWSOD. We replace the traditional object proposals with a self-supervised object proposal generator (SSOPG) and an autoencoder object proposal generator (AEOPG), which hypothesize object locations in an end-to-end manner. We further construct a multi-rate resampling pyramid (MRRP) to connect backbone and WSOD head, which strengthen the progress of object proposals. The optimization algorithm is based on a holistic self-training scheme that consists of step-wise entropy minimization (SEM) and consistency-constraint regularization (CCR).
  • Figure 3: Autoencoder architecture in AEOPG.
  • Figure 4: Illustration of MRRP on the $4^\mathrm{th}$ and $5^\mathrm{th}$ stages of backbone. We set $n^{\mathrm{stg}} = 2$, $n^{\mathrm{pls}} = 3$ and $\alpha^{\mathrm{vdl}} = \{1, 2, 3\}$.
  • Figure 5: Ablation study of loss weights on PASCAL VOC 2007 test in terms of $m$AP (%).
  • ...and 4 more figures