Table of Contents
Fetching ...

COCO-OLAC: A Benchmark for Occluded Panoptic Segmentation and Image Understanding

Wenbo Wei, Jun Wang, Abhir Bhalerao

TL;DR

COCO-OLAC introduces a large-scale occlusion-annotated benchmark derived from COCO to quantify occlusion impact on panoptic segmentation and to spur occlusion-aware methods. It combines a manually labeled occlusion-level dataset with a simple contrastive learning objective that aligns representations across occlusion levels via a triplet loss and a joint loss $L_{fin}=L_{seg}+L_{con}$. Across extensive experiments, occlusion significantly degrades SOTA models, but the proposed contrastive approach yields consistent improvements and achieves SOTA results on COCO-OLAC. This work provides a pragmatic occlusion benchmark and a lightweight, effective strategy for learning occlusion-robust representations in image understanding tasks.

Abstract

To help address the occlusion problem in panoptic segmentation and image understanding, this paper proposes a new large-scale dataset named COCO-OLAC (COCO Occlusion Labels for All Computer Vision Tasks), which is derived from the COCO dataset by manually labelling images into three perceived occlusion levels. Using COCO-OLAC, we systematically assess and quantify the impact of occlusion on panoptic segmentation on samples having different levels of occlusion. Comparative experiments with SOTA panoptic models demonstrate that the presence of occlusion significantly affects performance, with higher occlusion levels resulting in notably poorer performance. Additionally, we propose a straightforward yet effective method as an initial attempt to leverage the occlusion annotation using contrastive learning to render a model that learns a more robust representation capturing different severities of occlusion. Experimental results demonstrate that the proposed approach boosts the performance of the baseline model and achieves SOTA performance on the proposed COCO-OLAC dataset.

COCO-OLAC: A Benchmark for Occluded Panoptic Segmentation and Image Understanding

TL;DR

COCO-OLAC introduces a large-scale occlusion-annotated benchmark derived from COCO to quantify occlusion impact on panoptic segmentation and to spur occlusion-aware methods. It combines a manually labeled occlusion-level dataset with a simple contrastive learning objective that aligns representations across occlusion levels via a triplet loss and a joint loss . Across extensive experiments, occlusion significantly degrades SOTA models, but the proposed contrastive approach yields consistent improvements and achieves SOTA results on COCO-OLAC. This work provides a pragmatic occlusion benchmark and a lightweight, effective strategy for learning occlusion-robust representations in image understanding tasks.

Abstract

To help address the occlusion problem in panoptic segmentation and image understanding, this paper proposes a new large-scale dataset named COCO-OLAC (COCO Occlusion Labels for All Computer Vision Tasks), which is derived from the COCO dataset by manually labelling images into three perceived occlusion levels. Using COCO-OLAC, we systematically assess and quantify the impact of occlusion on panoptic segmentation on samples having different levels of occlusion. Comparative experiments with SOTA panoptic models demonstrate that the presence of occlusion significantly affects performance, with higher occlusion levels resulting in notably poorer performance. Additionally, we propose a straightforward yet effective method as an initial attempt to leverage the occlusion annotation using contrastive learning to render a model that learns a more robust representation capturing different severities of occlusion. Experimental results demonstrate that the proposed approach boosts the performance of the baseline model and achieves SOTA performance on the proposed COCO-OLAC dataset.
Paper Structure (11 sections, 2 equations, 3 figures, 5 tables)

This paper contains 11 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Visualization of a prediction result by the Mask2Former cheng2022mask2former network. The third car from the left in the middle, the truck in the center, and the truck near the roof are not detected due to occlusion.
  • Figure 2: Comparison of sample images with different occlusion levels from COCO-OLAC dataset. All images are overlaid with ground truth annotations to better show the occlusion relationship. The left, mid, and right columns show sample images with low, mid, and high occlusion levels, respectively. It is important to mention that only annotated instances are taken into account for assessment of the occlusion level.
  • Figure 3: The illustration of overall approach. The blue area below represents the schematic diagram of segmentation baseline, while the green area above illustrates our proposed contrastive learning-based method.