Table of Contents
Fetching ...

BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation

Jinxiang Lai, Wenlong Wu, Jiawei Zhan, Jian Li, Bin-Bin Gao, Jun Liu, Jie Zhang, Song Guo

TL;DR

This paper introduces BoxSeg, a quality-aware framework for box-supervised instance segmentation that tackles pseudo-mask noise via a Quality-Aware Module and enhances low-quality masks through Peer-assisted Copy-paste. The QAM comprises Box-Quality Ranking, Quality-aware Masks Fusion, and Mask-Quality Scoring to produce high-quality pseudo masks and robustly assess mask quality, while PC leverages high-quality peers to guide learning of poorer masks. The approach integrates into a teacher-student paradigm with EMA updates and a quality-weighted mask loss, and demonstrates state-of-the-art results on COCO and PASCAL VOC, along with extensive ablations and theoretical bounds on mask fusion. Collectively, BoxSeg provides a general, effective enhancement for BSIS that improves mask accuracy, boundary delineation, and generalization across architectures and training schedules.

Abstract

Box-supervised instance segmentation methods aim to achieve instance segmentation with only box annotations. Recent methods have demonstrated the effectiveness of acquiring high-quality pseudo masks under the teacher-student framework. Building upon this foundation, we propose a BoxSeg framework involving two novel and general modules named the Quality-Aware Module (QAM) and the Peer-assisted Copy-paste (PC). The QAM obtains high-quality pseudo masks and better measures the mask quality to help reduce the effect of noisy masks, by leveraging the quality-aware multi-mask complementation mechanism. The PC imitates Peer-Assisted Learning to further improve the quality of the low-quality masks with the guidance of the obtained high-quality pseudo masks. Theoretical and experimental analyses demonstrate the proposed QAM and PC are effective. Extensive experimental results show the superiority of our BoxSeg over the state-of-the-art methods, and illustrate the QAM and PC can be applied to improve other models.

BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation

TL;DR

This paper introduces BoxSeg, a quality-aware framework for box-supervised instance segmentation that tackles pseudo-mask noise via a Quality-Aware Module and enhances low-quality masks through Peer-assisted Copy-paste. The QAM comprises Box-Quality Ranking, Quality-aware Masks Fusion, and Mask-Quality Scoring to produce high-quality pseudo masks and robustly assess mask quality, while PC leverages high-quality peers to guide learning of poorer masks. The approach integrates into a teacher-student paradigm with EMA updates and a quality-weighted mask loss, and demonstrates state-of-the-art results on COCO and PASCAL VOC, along with extensive ablations and theoretical bounds on mask fusion. Collectively, BoxSeg provides a general, effective enhancement for BSIS that improves mask accuracy, boundary delineation, and generalization across architectures and training schedules.

Abstract

Box-supervised instance segmentation methods aim to achieve instance segmentation with only box annotations. Recent methods have demonstrated the effectiveness of acquiring high-quality pseudo masks under the teacher-student framework. Building upon this foundation, we propose a BoxSeg framework involving two novel and general modules named the Quality-Aware Module (QAM) and the Peer-assisted Copy-paste (PC). The QAM obtains high-quality pseudo masks and better measures the mask quality to help reduce the effect of noisy masks, by leveraging the quality-aware multi-mask complementation mechanism. The PC imitates Peer-Assisted Learning to further improve the quality of the low-quality masks with the guidance of the obtained high-quality pseudo masks. Theoretical and experimental analyses demonstrate the proposed QAM and PC are effective. Extensive experimental results show the superiority of our BoxSeg over the state-of-the-art methods, and illustrate the QAM and PC can be applied to improve other models.

Paper Structure

This paper contains 24 sections, 3 theorems, 9 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Theorem 3.1

The Upper Bound of the generalization error of the fused mask $m_i^t$ is as follows (proof is in sec:ana_qmf): with probability at least $1 - \delta$, where $m_i^*$ is the true mask, $\hat{K} = \sum_{n=1}^{K} \mathbbm{1}(s_{i,n} > \tau_m)$, $\epsilon_w$ is the maximum error in the box-quality score.

Figures (9)

  • Figure 1: Compared to BoxTeacher, our BoxSeg integrates two novel modules named Quality-Aware Module and Peer-assisted Copy-paste to obtain high-quality pseudo masks and improve the quality of the pseudo masks respectively.
  • Figure 2: BoxTeacher struggles to (a) overlapped objects distraction and (b) similar background distraction, while BoxSeg can obtain more accurate masks under these distractions.
  • Figure 3: The architecture of BoxSeg consists of a Teacher Model, a Student Model, and Peer-assisted Copy-paste. ① The input image $X$ is processed by the Teacher Model$f^{\varphi}$ with Quality-Aware Module (i.e., including Box-Quality Ranking, Quality-aware Masks Fusion and Mask-Quality Scoring) to produce the pseudo masks and estimate their qualities. The CondInst is chosen as the basic segmentation network, which is composed of a backbone $f_{b}^{\varphi}$, a box branch $f_{box}^{\varphi}$, and a mask branch $f_{mask}^{\varphi}$. ② The Student Model$f^{\theta}$ (i.e., CondInst model) predicts the image $X_s$ with augmentation and is supervised by the pseudo masks and the GT boxes, and then updates Teacher Model with exponential moving average (EMA). In the inference stage, the image is only processed by the Student Model to get the predictions of boxes and masks. ③ The Peer-assisted Copy-paste copies and pastes the peer objects with high-quality pseudo masks into the image to assist the optimization of the low-quality masks.
  • Figure 4: Peer-assisted Copy-paste: (1) Memory Bank collects the peer objects with high-quality pseudo masks non-overlapped with any objects. (2) The peer object is pasted into the image and overlapped with the object. The blue line is the low-quality mask of input object, and red line is the high-quality mask of peer object.
  • Figure 5: Visualization results of BoxTeacher (first row) and our BoxSeg (second row) with ResNet-101 on COCO test-dev. (a) Non-overlapped objects. (b) Overlapped inter-class objects. (c) Overlapped intra-class objects.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Theorem 3.6
  • Corollary 3.7
  • Remark 3.8
  • Remark 3.9
  • Remark 3.10