Table of Contents
Fetching ...

BLADE: Box-Level Supervised Amodal Segmentation through Directed Expansion

Zhaochen Liu, Zhixuan Li, Tingting Jiang

TL;DR

This work introduces an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation and can outperform existing state-of-the-art methods with large margins.

Abstract

Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.

BLADE: Box-Level Supervised Amodal Segmentation through Directed Expansion

TL;DR

This work introduces an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation and can outperform existing state-of-the-art methods with large margins.

Abstract

Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.
Paper Structure (24 sections, 14 equations, 5 figures, 2 tables)

This paper contains 24 sections, 14 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An illustration of the overlapping region. The overlapping region of an object is the tightest bounding box that covers all intersecting areas of its amodal bounding box and those of other objects, so the occluded portion of each object should be inside if exists.
  • Figure 2: A schematic illustration of the proposed BLADE approach. Extracted features with relative coordinates maps are input to visible-branch, amodal-branch, and region-branch to predict the visible mask, coarse amodal mask, and overlapping region map of each instance respectively, which all adopt dynamically generated instance-aware mask heads. Exploiting the correlation, predicted visible masks are also input to the amodal branch for our proposed connectivity loss that directs the expansion from visible masks to corresponding coarse amodal masks. The final outputs use coarse amodal masks in predicted overlapping regions and visible masks in other regions.
  • Figure 3: If there are multiple intersecting areas, the envelope box is used as the ground-truth overlapping region. For the example in the figure, both $\mathbf{B}_a^{j_1}$ and $\mathbf{B}_a^{j_2}$ overlaps $\mathbf{B}_a^{i}$, then the red box $\mathbf{R}^i$ is defined as the overlapping region of instance $i$.
  • Figure 4: An illustration of the connectivity loss. (a) The connectivity loss contains two terms, namely neighbor loss and uniform loss. The neighbor loss measures the label consistency of each pixel with its neighbors in $\mathbf{m}_a$, while the uniform loss reflects the consistency of corresponding pixels between $\mathbf{m}_a$ and $\mathbf{m}_v$. (b) The neighbor loss is applied to predicted-overlapping-visible pixels (region ①), while the uniform loss is applied to the whole overlapping region $\mathbf{R}$ (region ①+②). (c) By the action of the connectivity loss, an active band is built as the initiation of expansion. Multiple losses for the amodal-branch reach a balance of encouragement and inhibition of expansion thus directing a moderate expansion.
  • Figure 5: Qualitative examples of our approach compared with corresponding ground-truth amodal masks and estimations of BBTP, BoxInst, Bayesian-Amodal (both the known-$c$ model and the unknown-$c$ model). Zoom in for a better view.