BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
Boris Meden, Asma Brazi, Fabrice Mayran de Chamisso, Steve Bourgeois, Vincent Lepetit
TL;DR
This work addresses the problem that ground-truth annotations for 6D pose estimation often ignore per-image visual ambiguities caused by symmetry and occlusions, which can mislead method evaluation. It introduces an automatic per-image ground-truth distribution mechanism that combines an image-specific symmetry pattern with a ground-truth pose to produce an SE(3) distribution for each image, enabling fair assessment of both single-pose and multi-modal distribution methods on real data. The authors re-evaluate state-of-the-art single-pose methods on the re-annotated datasets (T-LESS and YCB-V), observing substantial shifts in rankings, and propose a precision/recall framework for pose-distribution evaluation with MPD/MSD-based distance measures. They further provide comprehensive methodology, validation, and discussion on limitations and downstream consequences, demonstrating that per-image ground truth is crucial for reliable benchmarking and for enabling robust downstream tasks like grasping or next-best-view planning.
Abstract
6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.
