Table of Contents
Fetching ...

Category-Level Object Shape and Pose Estimation in Less Than a Millisecond

Lorenzo Shaikewitz, Tim Nguyen, Luca Carlone

TL;DR

A fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality, and uses a learned front-end to detect sparse, category-level semantic keypoints on the target object.

Abstract

Object shape and pose estimation is a foundational robotics problem, supporting tasks from manipulation to scene understanding and navigation. We present a fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality. Given an RGB-D image of an object, we use a learned front-end to detect sparse, category-level semantic keypoints on the target object. We represent the target object's unknown shape using a linear active shape model and pose a maximum a posteriori optimization problem to solve for position, orientation, and shape simultaneously. Expressed in unit quaternions, this problem admits first-order optimality conditions in the form of an eigenvalue problem with eigenvector nonlinearities. Our primary contribution is to solve this problem efficiently with self-consistent field iteration, which only requires computing a 4-by-4 matrix and finding its minimum eigenvalue-vector pair at each iterate. Solving a linear system for the corresponding Lagrange multipliers gives a simple global optimality certificate. One iteration of our solver runs in about 100 microseconds, enabling fast outlier rejection. We test our method on synthetic data and a variety of real-world settings, including two public datasets and a drone tracking scenario. Code is released at https://github.com/MIT-SPARK/Fast-ShapeAndPose.

Category-Level Object Shape and Pose Estimation in Less Than a Millisecond

TL;DR

A fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality, and uses a learned front-end to detect sparse, category-level semantic keypoints on the target object.

Abstract

Object shape and pose estimation is a foundational robotics problem, supporting tasks from manipulation to scene understanding and navigation. We present a fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality. Given an RGB-D image of an object, we use a learned front-end to detect sparse, category-level semantic keypoints on the target object. We represent the target object's unknown shape using a linear active shape model and pose a maximum a posteriori optimization problem to solve for position, orientation, and shape simultaneously. Expressed in unit quaternions, this problem admits first-order optimality conditions in the form of an eigenvalue problem with eigenvector nonlinearities. Our primary contribution is to solve this problem efficiently with self-consistent field iteration, which only requires computing a 4-by-4 matrix and finding its minimum eigenvalue-vector pair at each iterate. Solving a linear system for the corresponding Lagrange multipliers gives a simple global optimality certificate. One iteration of our solver runs in about 100 microseconds, enabling fast outlier rejection. We test our method on synthetic data and a variety of real-world settings, including two public datasets and a drone tracking scenario. Code is released at https://github.com/MIT-SPARK/Fast-ShapeAndPose.

Paper Structure

This paper contains 21 sections, 3 theorems, 40 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

Let the unit quaternion $\mathbf{q}\in\mathbb{S}^3$ represent the same rotation as the matrix $\mathbf{R}\in\mathrm{SO}(3)\xspace$. For $\mathbf{x}, \mathbf{y}\in\mathbb{R}^3$ vectors: $\mathbf{x}^\mathsf{T} \mathbf{R} \mathbf{y} = -\mathbf{q}^\mathsf{T} \mathbf{\Omega}_l(\mathbf{x})\mathbf{\Omega}_

Figures (5)

  • Figure 1: Overview of Method. Given 3D keypoint detections (a), on an RGB-D image, and a category-level shape library (b), we use self-consistent field iteration (c), to estimate the shape and pose of an object (d).
  • Figure 2: Stereographic projections of self-consistent field iterates. Beginning from a unit quaternion $\mathbf{q}_0\in\mathbb{S}^3$, SCF rapidly converges to a local stationary point. Left, a single SCF trajectory. Right, unit quaternion iterates stereographically projected into the volume of the $3$-dimensional unit ball (see appendix:scf_proj) and colored by which of the two local minima SCF converges to. Nearby starting points tend to converge to the same local minimum except at the distinct boundary. Plots show synthetic data with high measurement noise ($\sigma_m = 5$).
  • Figure 3: Overview of experiments. We test on a variety of datasets and synthetic data (not pictured). Left, NOCS-REAL275 Wang19-normalizedCoordinate contains common household categories including mugs and cameras. Upper right, the CAST drone dataset Shaikewitz24ral-CAST includes pictures from an aerial quadcopter following a small racecar. Lower right, ApolloCar3D Song19-apollocar3d has real-world autonomous driving. Sample pose estimates are highlighted in color.
  • Figure 4: Distribution of rotation errors for Gauss-Newton, SCF, $\text{SCF}^\star$, and $\text{PACE}^\star$. GN, SCF, and PACE have nearly identical performance although SCF runs significantly faster. $\text{SCF}^\star$ and $\text{PACE}^\star$ show only certifiably optimal estimates. $\text{SCF}^\star$ consistently filters out the worst estimates.
  • Figure 5: Distribution of Rotation Errors for Larger Shape Library. For $K>N$ the performance depends heavily on choice of regularization $\lambda$. For $\lambda=1.0$, GN, SCF, and PACE have very similar rotation accuracy across noise scales. $\text{SCF}^\star$ and $\text{PACE}^\star$ show only the globally optimal estimates.

Theorems & Definitions (3)

  • Lemma 1: Yang19iccv-QUASAR
  • Proposition 2: Optimal Shape and Position Shi23tro-PACE
  • Proposition 3: Eigenproblem for Local Solutions