Table of Contents
Fetching ...

Uncertainty Quantification for Visual Object Pose Estimation

Lorenzo Shaikewitz, Charis Georgiou, Luca Carlone

TL;DR

This work tackles uncertainty quantification for monocular visual object pose estimation under minimal distributional assumptions. It introduces SLUE, a convex, S-Lemma-based framework that converts a non-convex pose-uncertainty problem into a minimum-volume ellipsoid bound, guaranteed to enclose the true pose with high probability. By formulating the problem as a generalized S-Lemma and employing a sum-of-squares hierarchy, SLUE achieves tighter translation bounds and competitive orientation bounds while remaining computationally efficient, with a projection step yielding interpretable separate bounds for translation and axis-angle rotation. The approach is validated on multiple datasets and a real-world drone scenario, demonstrating practical gains over prior methods and providing an open-source implementation. Overall, SLUE offers a statistically rigorous, scalable solution for integrating pose-uncertainty into robotics pipelines without strong distributional assumptions.

Abstract

Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically rigorous uncertainty is not well understood without strict distributional assumptions. We develop distribution-free pose uncertainty bounds about a given pose estimate in the monocular setting. Our pose uncertainty only requires high probability noise bounds on pixel detections of 2D semantic keypoints on a known object. This noise model induces an implicit, non-convex set of pose uncertainty constraints. Our key contribution is SLUE (S-Lemma Uncertainty Estimation), a convex program to reduce this set to a single ellipsoidal uncertainty bound that is guaranteed to contain the true object pose with high probability. SLUE solves a relaxation of the minimum volume bounding ellipsoid problem inspired by the celebrated S-lemma. It requires no initial guess of the bound's shape or size and is guaranteed to contain the true object pose with high probability. For tighter uncertainty bounds at the same confidence, we extend SLUE to a sum-of-squares relaxation hierarchy which is guaranteed to converge to the minimum volume ellipsoidal uncertainty bound for a given set of keypoint constraints. We show this pose uncertainty bound can easily be projected to independent translation and axis-angle orientation bounds. We evaluate SLUE on two pose estimation datasets and a real-world drone tracking scenario. Compared to prior work, SLUE generates substantially smaller translation bounds and competitive orientation bounds. We release code at https://github.com/MIT-SPARK/PoseUncertaintySets.

Uncertainty Quantification for Visual Object Pose Estimation

TL;DR

This work tackles uncertainty quantification for monocular visual object pose estimation under minimal distributional assumptions. It introduces SLUE, a convex, S-Lemma-based framework that converts a non-convex pose-uncertainty problem into a minimum-volume ellipsoid bound, guaranteed to enclose the true pose with high probability. By formulating the problem as a generalized S-Lemma and employing a sum-of-squares hierarchy, SLUE achieves tighter translation bounds and competitive orientation bounds while remaining computationally efficient, with a projection step yielding interpretable separate bounds for translation and axis-angle rotation. The approach is validated on multiple datasets and a real-world drone scenario, demonstrating practical gains over prior methods and providing an open-source implementation. Overall, SLUE offers a statistically rigorous, scalable solution for integrating pose-uncertainty into robotics pipelines without strong distributional assumptions.

Abstract

Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically rigorous uncertainty is not well understood without strict distributional assumptions. We develop distribution-free pose uncertainty bounds about a given pose estimate in the monocular setting. Our pose uncertainty only requires high probability noise bounds on pixel detections of 2D semantic keypoints on a known object. This noise model induces an implicit, non-convex set of pose uncertainty constraints. Our key contribution is SLUE (S-Lemma Uncertainty Estimation), a convex program to reduce this set to a single ellipsoidal uncertainty bound that is guaranteed to contain the true object pose with high probability. SLUE solves a relaxation of the minimum volume bounding ellipsoid problem inspired by the celebrated S-lemma. It requires no initial guess of the bound's shape or size and is guaranteed to contain the true object pose with high probability. For tighter uncertainty bounds at the same confidence, we extend SLUE to a sum-of-squares relaxation hierarchy which is guaranteed to converge to the minimum volume ellipsoidal uncertainty bound for a given set of keypoint constraints. We show this pose uncertainty bound can easily be projected to independent translation and axis-angle orientation bounds. We evaluate SLUE on two pose estimation datasets and a real-world drone tracking scenario. Compared to prior work, SLUE generates substantially smaller translation bounds and competitive orientation bounds. We release code at https://github.com/MIT-SPARK/PoseUncertaintySets.

Paper Structure

This paper contains 30 sections, 9 theorems, 58 equations, 9 figures, 5 tables.

Key Result

Proposition 1

Assume measurements of $N$ object keypoints of the form eq:conformal_meas and noise bounded in infinity-norm with high probability as eq:conformal_boundedprob. The true position $\mathbf{t}_\mathrm{gt}$ and orientation $\mathbf{R}_\mathrm{gt}$ of the object are contained in the following constraint with probability at least $\beta$. For arbitrary dependence among $\bm{\epsilon}_i$, $\beta \geq 1-

Figures (9)

  • Figure 1: Conformal Pose and Uncertainty Estimation. Given an RGB image of an object (a), we extract 2D semantic keypoints and conformal uncertainty sets (b) which contain the ground truth keypoint with high probability. These sets imply a non-convex set of quadratic constraints on the object pose. We use a generalization of the S-Lemma and a projection scheme to reduce this set to an explicit bound (c) containing the true object pose with high probability and centered at the pose estimate, highlighted in black.
  • Figure 2: Keypoint measurements. Given a 3D model with annotated 3D keypoints (left), we assume pixel detections of the location of each keypoint in the image frame (right). Pixel keypoint measurements also carry an uncertainty bound, shaded in blue.
  • Figure 3: Pose Uncertainty Constraint Set. The infinity-norm bounds on keypoint error (left) each imply a cone of backprojected 3D feasible keypoint positions (right). Combining the bounds for multiple keypoints and imposing object shape constraints yields an implicit pose uncertainty constraint set which contains many feasible poses.
  • Figure 4: Hierarchy of Bounding Ellipsoids. We solve for an ellipsoidal representation \ref{['eq:ellipse_understandable']}, in blue, of a set defined by several quadratic constraints \ref{['eq:purse']}, in purple, which may be non-convex. Our approach (SLUE) admits a hierarchy of ellipsoidal bounds guaranteed to converge to the minimum volume bound as relaxation order $\kappa$ tends to infinity. In this 2D toy example, the relaxation converges by $\kappa=3$.
  • Figure 5: Image-plane projections of ellipsoidal pose uncertainty. Plots show the set of possible poses in the second-order joint ellipsoidal bound for $\alpha=0.1$. Uncertainty is mostly concentrated along the optical axis. For CAST, we only show translation uncertainty about the pose estimate.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Proposition 1: Pose Uncertainty Constraint Set
  • proof
  • Proposition 2: Generalized S-Lemma
  • proof
  • Proposition 3: Bounding Ellipsoid
  • Proposition 4: SOS Generalized S-Lemma
  • proof
  • Proposition 5: Bounding Ellipsoid Hierarchy
  • Theorem 6: Hierarchy Convergence Nie05optim-MinimalEnclosingEllipsoidTang24l4dc-setMembership
  • Proposition 7
  • ...and 3 more