Table of Contents
Fetching ...

BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities

Boris Meden, Asma Brazi, Fabrice Mayran de Chamisso, Steve Bourgeois, Vincent Lepetit

TL;DR

This work addresses the problem that ground-truth annotations for 6D pose estimation often ignore per-image visual ambiguities caused by symmetry and occlusions, which can mislead method evaluation. It introduces an automatic per-image ground-truth distribution mechanism that combines an image-specific symmetry pattern with a ground-truth pose to produce an SE(3) distribution for each image, enabling fair assessment of both single-pose and multi-modal distribution methods on real data. The authors re-evaluate state-of-the-art single-pose methods on the re-annotated datasets (T-LESS and YCB-V), observing substantial shifts in rankings, and propose a precision/recall framework for pose-distribution evaluation with MPD/MSD-based distance measures. They further provide comprehensive methodology, validation, and discussion on limitations and downstream consequences, demonstrating that per-image ground truth is crucial for reliable benchmarking and for enabling robust downstream tasks like grasping or next-best-view planning.

Abstract

6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.

BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities

TL;DR

This work addresses the problem that ground-truth annotations for 6D pose estimation often ignore per-image visual ambiguities caused by symmetry and occlusions, which can mislead method evaluation. It introduces an automatic per-image ground-truth distribution mechanism that combines an image-specific symmetry pattern with a ground-truth pose to produce an SE(3) distribution for each image, enabling fair assessment of both single-pose and multi-modal distribution methods on real data. The authors re-evaluate state-of-the-art single-pose methods on the re-annotated datasets (T-LESS and YCB-V), observing substantial shifts in rankings, and propose a precision/recall framework for pose-distribution evaluation with MPD/MSD-based distance measures. They further provide comprehensive methodology, validation, and discussion on limitations and downstream consequences, demonstrating that per-image ground truth is crucial for reliable benchmarking and for enabling robust downstream tasks like grasping or next-best-view planning.

Abstract

6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.
Paper Structure (43 sections, 11 equations, 19 figures, 4 tables)

This paper contains 43 sections, 11 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: We provide for the first time 6D pose annotations in the form of a per-image object pose distribution. Current annotations in BOP hodan2024bop datasets are given as a single pose, shown here as a circle in the SO(3) representations. BOP also provides a symmetry pattern per object, from which a distribution can be computed (the colored points in SO(3)). Such distribution however does not cover many cases manhardt2019explaining: In this example, when only the core is visible (Case 1), the pose is fully ambiguous and should be represented by a continuous distribution in SO(3). When the sides of the head are visible (Case 2), there are still ambiguities and the distribution is made of 6 modes. When the hole is visible (Case 3), the pose distribution should be concentrated around one non-ambiguous pose. Our method annotates scenes with per-image distributions, taking into account the partial occlusions and allowing us to evaluate a predicted pose properly. We show that considering these distributions for evaluation results in a significant change of ranking for the BOP challenge. Such ground truth distributions also become a key asset when it comes to evaluating pose distribution estimation methods haugaard2023spyroposehsiao2024confronting. With appropriate metrics, we demonstrate the first quantitative evaluation of pose distribution methods on real images, as an extension to single pose methods.
  • Figure 2: Method overview. From a symmetry candidate set, we pre-compute the object per-vertex $\epsilon\text{-sym}$. Then for a given scene, we compute the vertices visibility ($\checkmark$ and ✗ illustrate respectively if the visibility test passed or not for the vertex) and perform a robust intersection between their $\epsilon\text{-sym}$. This intersection is then pruned with a depth comparison and the result constitutes the symmetries pattern of this object instance for this image. When multiplied by the ground truth, we obtain the SE(3) distribution of the object instance.
  • Figure 3: Visualizations of our ground truth. We display SE(3) ground truth distributions for scene 1 of T-LESS hodanTLESSRGBDDataset2017. Circle on orientation diagram represents the unique ground truth pose provided as input to our method. Colors link objects to their distributions.
  • Figure 4: Depth deviation post-processing analysis. For a given image, we display the depth renderings of the ground truth pose and of one $\epsilon\text{-sym}$ mode (1 here). They align well.
  • Figure 5: Depth deviation post-processing analysis. For a given image, we display the depth renderings of the ground truth pose and of one $\epsilon\text{-sym}$ mode (12). Mode 12 generates several falsely occluded pixels (where the hole should be) and falsely visible pixels (where the hole is but shouldn't be). Mode 12 is rejected by or pruning stage.
  • ...and 14 more figures