Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation

Kira Wursthorn; Markus Hillemann; Markus Ulrich

Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation

Kira Wursthorn, Markus Hillemann, Markus Ulrich

TL;DR

This work addresses reliable 6D object pose estimation under uncertainty by applying deep ensembles to SurfEmb, a top multi-stage pose estimator, and introducing a regression-calibration score uscore to quantify uncertainty quality. Pose uncertainty is modeled via a posterior predictive distribution over ensemble predictions, approximated as a Gaussian with mean $\mu$ and variance $\sigma^2$, and calibrated using reliability diagrams. Experiments on T-LESS and YCB-V show that a 10-member SurfEmb ensemble yields well-calibrated uncertainties (high uscore) and often improves pose recall metrics across MSPD, MSSD, and VSD compared to single models, though calibration degrades through PnP and refinement steps. The study highlights orientation representation effects (Rodriguez axis-angle performing best) and argues for end-to-end differentiable PnP to better propagate uncertainty, offering practical guidance for deploying reliable multi-stage pose systems in real-world settings.

Abstract

The estimation of 6D object poses is a fundamental task in many computer vision applications. Particularly, in high risk scenarios such as human-robot interaction, industrial inspection, and automation, reliable pose estimates are crucial. In the last years, increasingly accurate and robust deep-learning-based approaches for 6D object pose estimation have been proposed. Many top-performing methods are not end-to-end trainable but consist of multiple stages. In the context of deep uncertainty quantification, deep ensembles are considered as state of the art since they have been proven to produce well-calibrated and robust uncertainty estimates. However, deep ensembles can only be applied to methods that can be trained end-to-end. In this work, we propose a method to quantify the uncertainty of multi-stage 6D object pose estimation approaches with deep ensembles. For the implementation, we choose SurfEmb as representative, since it is one of the top-performing 6D object pose estimation approaches in the BOP Challenge 2022. We apply established metrics and concepts for deep uncertainty quantification to evaluate the results. Furthermore, we propose a novel uncertainty calibration score for regression tasks to quantify the quality of the estimated uncertainty.

Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation

TL;DR

and variance

, and calibrated using reliability diagrams. Experiments on T-LESS and YCB-V show that a 10-member SurfEmb ensemble yields well-calibrated uncertainties (high uscore) and often improves pose recall metrics across MSPD, MSSD, and VSD compared to single models, though calibration degrades through PnP and refinement steps. The study highlights orientation representation effects (Rodriguez axis-angle performing best) and argues for end-to-end differentiable PnP to better propagate uncertainty, offering practical guidance for deploying reliable multi-stage pose systems in real-world settings.

Abstract

Paper Structure (16 sections, 3 equations, 6 figures)

This paper contains 16 sections, 3 equations, 6 figures.

Introduction
Related work
uq in Deep Learning
uq for Object Pose Estimation
Background of SurfEmb and Deep Ensembles
SurfEmb
Deep Ensembles
Methodology
SurfEmb Deep Ensemble
Ensemble Evaluation
Uncertainty Evaluation
Experiments
Evaluation of the Ensemble Pose Estimates
Evaluation of the Ensemble Uncertainty
Discussion
...and 1 more sections

Figures (6)

Figure 1: uscore for simulated uncertainty predictions with a ground truth standard deviation $\sigma_{true} = 0.3$ (dashed line). For a perfect calibration, where the simulated and predicted uncertainty match, UCS is close to $1$.
Figure 2: Two examples of RGB images from different scenes of the bop test dataset of T-LESS.
Figure 3: Evaluation results of the differently trained SurfEmb models on the BOP test datasets of T-LESS and YCB-V, both without (RGB) and with depth refinement (RGB-D). Shown are the reproduced $AR$ of the models trained and provided by the authors of SurfEmb (Pretrained), the mean $AR$ of the randomly initialized ensemble members (Baseline), and the evaluation results of the mean poses of the ensembles (Ensemble).
Figure 4: Reliability diagram of the T-LESS query model ensemble with the optimal ensemble size of eight ensemble members. The perfect calibration as the diagonal is represented by the dashed gray line.
Figure 5: Reliability diagrams of the estimated ensemble orientation and position components on T-LESS and YCB-V. The perfect calibration is represented by the dashed gray line.
...and 1 more figures

Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation

TL;DR

Abstract

Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)