Uncertainty Quantification for Visual Object Pose Estimation
Lorenzo Shaikewitz, Charis Georgiou, Luca Carlone
TL;DR
This work tackles uncertainty quantification for monocular visual object pose estimation under minimal distributional assumptions. It introduces SLUE, a convex, S-Lemma-based framework that converts a non-convex pose-uncertainty problem into a minimum-volume ellipsoid bound, guaranteed to enclose the true pose with high probability. By formulating the problem as a generalized S-Lemma and employing a sum-of-squares hierarchy, SLUE achieves tighter translation bounds and competitive orientation bounds while remaining computationally efficient, with a projection step yielding interpretable separate bounds for translation and axis-angle rotation. The approach is validated on multiple datasets and a real-world drone scenario, demonstrating practical gains over prior methods and providing an open-source implementation. Overall, SLUE offers a statistically rigorous, scalable solution for integrating pose-uncertainty into robotics pipelines without strong distributional assumptions.
Abstract
Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically rigorous uncertainty is not well understood without strict distributional assumptions. We develop distribution-free pose uncertainty bounds about a given pose estimate in the monocular setting. Our pose uncertainty only requires high probability noise bounds on pixel detections of 2D semantic keypoints on a known object. This noise model induces an implicit, non-convex set of pose uncertainty constraints. Our key contribution is SLUE (S-Lemma Uncertainty Estimation), a convex program to reduce this set to a single ellipsoidal uncertainty bound that is guaranteed to contain the true object pose with high probability. SLUE solves a relaxation of the minimum volume bounding ellipsoid problem inspired by the celebrated S-lemma. It requires no initial guess of the bound's shape or size and is guaranteed to contain the true object pose with high probability. For tighter uncertainty bounds at the same confidence, we extend SLUE to a sum-of-squares relaxation hierarchy which is guaranteed to converge to the minimum volume ellipsoidal uncertainty bound for a given set of keypoint constraints. We show this pose uncertainty bound can easily be projected to independent translation and axis-angle orientation bounds. We evaluate SLUE on two pose estimation datasets and a real-world drone tracking scenario. Compared to prior work, SLUE generates substantially smaller translation bounds and competitive orientation bounds. We release code at https://github.com/MIT-SPARK/PoseUncertaintySets.
