Table of Contents
Fetching ...

COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images

Panagiotis Sapoutzoglou, Georgios Giapitzakis, Georgios Floros, George Terzakis, Maria Pateraki

TL;DR

COBRA addresses the lack of a method-agnostic runtime confidence measure for 6D pose estimation from single images by constructing a lightweight, GP-based directional distance field template learned from sparse interior reference points. Pose quality is scored by comparing back-projected image points against this template via a mixture of GP priors, yielding a probabilistic confidence that correlates with traditional pose accuracy metrics such as ADD. The approach is validated on ShapeNetCore and IndustryShapes, showing strong shape-representation fidelity (low Chamfer distance and high F-score) and a robust negative correlation between ADD and COBRA confidence, with kernel choice and the number of reference points identified as key design factors. These results demonstrate a practical, interpretable, and method-agnostic tool for assessing pose estimates in robotics and vision applications, including real-world industrial scenarios. The work also highlights limitations around reference-point placement and suggests future work on automated coverage guarantees and fully automated template construction.

Abstract

We propose a generic procedure for assessing 6D object pose estimates. Our approach relies on the evaluation of discrepancies in the geometry of the observed object, in particular its respective estimated back-projection in 3D, against a putative functional shape representation comprising mixtures of Gaussian Processes, that act as a template. Each Gaussian Process is trained to yield a fragment of the object's surface in a radial fashion with respect to designated reference points. We further define a pose confidence measure as the average probability of pixel back-projections in the Gaussian mixture. The goal of our experiments is two-fold. a) We demonstrate that our functional representation is sufficiently accurate as a shape template on which the probability of back-projected object points can be evaluated, and, b) we show that the resulting confidence scores based on these probabilities are indeed a consistent quality measure of pose.

COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images

TL;DR

COBRA addresses the lack of a method-agnostic runtime confidence measure for 6D pose estimation from single images by constructing a lightweight, GP-based directional distance field template learned from sparse interior reference points. Pose quality is scored by comparing back-projected image points against this template via a mixture of GP priors, yielding a probabilistic confidence that correlates with traditional pose accuracy metrics such as ADD. The approach is validated on ShapeNetCore and IndustryShapes, showing strong shape-representation fidelity (low Chamfer distance and high F-score) and a robust negative correlation between ADD and COBRA confidence, with kernel choice and the number of reference points identified as key design factors. These results demonstrate a practical, interpretable, and method-agnostic tool for assessing pose estimates in robotics and vision applications, including real-world industrial scenarios. The work also highlights limitations around reference-point placement and suggests future work on automated coverage guarantees and fully automated template construction.

Abstract

We propose a generic procedure for assessing 6D object pose estimates. Our approach relies on the evaluation of discrepancies in the geometry of the observed object, in particular its respective estimated back-projection in 3D, against a putative functional shape representation comprising mixtures of Gaussian Processes, that act as a template. Each Gaussian Process is trained to yield a fragment of the object's surface in a radial fashion with respect to designated reference points. We further define a pose confidence measure as the average probability of pixel back-projections in the Gaussian mixture. The goal of our experiments is two-fold. a) We demonstrate that our functional representation is sufficiently accurate as a shape template on which the probability of back-projected object points can be evaluated, and, b) we show that the resulting confidence scores based on these probabilities are indeed a consistent quality measure of pose.
Paper Structure (35 sections, 26 equations, 28 figures, 3 tables)

This paper contains 35 sections, 26 equations, 28 figures, 3 tables.

Figures (28)

  • Figure 1: Our lightweight, GP-based representation (left) has the capacity to capture the shape variability of complex real-world objects, thus providing a reliable confidence score for 6D pose estimates (right).
  • Figure 2: Off-line stage: We partition a sparse point cloud using distance-based clustering and extract reference points (Sec. \ref{['ssect:lightweighht_shape_templates']}). Each 3D point $P_i$ is parameterized as a bearing vector in spherical coordinates $U_i$ relative to its reference point $C_i$. The directions, with distances as targets, serve as inputs to a Gaussian Process ($GP_i$) (Sec. \ref{['ssect:GP_prior_for_distance']}). The GP mixture model forms our shape representation (template). On-line stage: Using an estimated 6D pose and 2D-3D correspondences, we back-project the object's pixels, parameterize them, and predict distances using the template. Finally, a confidence score (Eq. \ref{['eq:generic_confidence']}) and a lower bound (Eq. \ref{['eq:confidence_bound']}) are computed to assess the pose quality.
  • Figure 3: Spherical directional distance field (blue rays) centered at reference point $\pmb{C}$ (red).
  • Figure 4: Ray-casting to the objects's surface (blue) from reference points (red) by retaining the first intersection. More dense coverage achieved with increasing number of reference points (left-to-right).
  • Figure 5: Assignment of training points to clusters. (a)-(b) Ground truth - reconstructed without overlap. (c)-(d) Ground truth - reconstructed with overlap (blue).
  • ...and 23 more figures