Table of Contents
Fetching ...

BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas

TL;DR

The paper documents the BOP Challenge 2023, extending model-based 6D object pose estimation to unseen objects via a strict onboarding constraint and a unified evaluation framework. It reports that unseen-object methods (GenFlow) reached seen-object CosyPose-level accuracy, while a fast, RGB-only pipeline (GPose) achieved substantial runtime gains over 2022 bests; results across Tasks 2–6 show steady progress in 2D detection/segmentation and notable potential in unseen-object localization. The datasets blend core real RGB-D benchmarks with large synthetic onboarding data (MegaPose), and the online evaluation remains openly accessible, enabling ongoing benchmarking. Overall, the year highlights strong improvements in unseen-object localization and competitive performance across seen-object tasks, while pointing to remaining gaps in occlusion handling and runtime efficiency for practical deployment.

Abstract

We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 challenge introduced new variants of these tasks focused on objects unseen during training. In the new tasks, methods were required to learn new objects during a short onboarding stage (max 5 minutes, 1 GPU) from provided 3D object models. The best 2023 method for 6D localization of unseen objects (GenFlow) notably reached the accuracy of the best 2020 method for seen objects (CosyPose), although being noticeably slower. The best 2023 method for seen objects (GPose) achieved a moderate accuracy improvement but a significant 43% run-time improvement compared to the best 2022 counterpart (GDRNPP). Since 2017, the accuracy of 6D localization of seen objects has improved by more than 50% (from 56.9 to 85.6 AR_C). The online evaluation system stays open and is available at: http://bop.felk.cvut.cz/.

BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

TL;DR

The paper documents the BOP Challenge 2023, extending model-based 6D object pose estimation to unseen objects via a strict onboarding constraint and a unified evaluation framework. It reports that unseen-object methods (GenFlow) reached seen-object CosyPose-level accuracy, while a fast, RGB-only pipeline (GPose) achieved substantial runtime gains over 2022 bests; results across Tasks 2–6 show steady progress in 2D detection/segmentation and notable potential in unseen-object localization. The datasets blend core real RGB-D benchmarks with large synthetic onboarding data (MegaPose), and the online evaluation remains openly accessible, enabling ongoing benchmarking. Overall, the year highlights strong improvements in unseen-object localization and competitive performance across seen-object tasks, while pointing to remaining gaps in occlusion handling and runtime efficiency for practical deployment.

Abstract

We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 challenge introduced new variants of these tasks focused on objects unseen during training. In the new tasks, methods were required to learn new objects during a short onboarding stage (max 5 minutes, 1 GPU) from provided 3D object models. The best 2023 method for 6D localization of unseen objects (GenFlow) notably reached the accuracy of the best 2020 method for seen objects (CosyPose), although being noticeably slower. The best 2023 method for seen objects (GPose) achieved a moderate accuracy improvement but a significant 43% run-time improvement compared to the best 2022 counterpart (GDRNPP). Since 2017, the accuracy of 6D localization of seen objects has improved by more than 50% (from 56.9 to 85.6 AR_C). The online evaluation system stays open and is available at: http://bop.felk.cvut.cz/.
Paper Structure (19 sections, 4 figures, 7 tables)

This paper contains 19 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Progress in model-based 6D object localization (2017--2023). Shown is the accuracy and run time of the top performing RGB-D methods on the seven core BOP datasets. The dominance of methods based on point-pair features drost2010model, represented by Vidal et al. vidal2018method in 2017, was ended by the learning-based CosyPose labbe2020cosypose in 2020 for the price of a significantly higher run time. In 2022, GDRNPP Wang_2021_GDRNliu2022gdrnpp_bop dramatically improved both accuracy and run time. Finally, in 2023, GPose gpose2023 brought the run time back to the 2017 level while further improving the accuracy. The field has come a long way since 2017 -- the accuracy has improved by more than 50% (from 56.9 to 85.6 AR$_C$). GenFlow genflow, the best method for the newly introduced task of 6D localization of unseen objects (objects not seen during training), reaches the accuracy of CosyPose, the best 2020 method for seen objects, while its run time awaits improvements.
  • Figure 2: An overview of the BOP datasets. The seven core datasets are marked with a star. Shown are RGB channels of sample test images which were darkened and overlaid with colored 3D object models in the ground-truth 6D poses.
  • Figure 3: Example training images from the MegaPose dataset megapose. This dataset includes 2M images showing annotated instances of more than 50K diverse objects and is meant for training methods for tasks on unseen objects (Tasks 4--6). The objects are not present in any other BOP dataset and their 3D models are available.
  • Figure 4: Qualitative comparison of the state-of-the-art methods for 6D localization of seen (GPose) and unseen objects (GenFlow) on sample images from LM-O brachmann2014learning and YCB-V xiang2017posecnn. The bottom row shows the depth error map of each estimated pose w.r.t. the ground-truth pose. The map shows the distance between each 3D point in the ground-truth depth map and its position in the estimated pose (darker red indicates higher error: 0 cm 10 cm). While GenFlow demonstrates strong performance on unseen objects, it tends to fail on challenging cases with heavy object occlusion (e.g., the drill in the sample LM-O image or the meat can in the YCB-V image).