Table of Contents
Fetching ...

A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking

Lorenzo Shaikewitz, Samuel Ubellacker, Luca Carlone

TL;DR

The paper addresses category-level shape estimation and pose tracking from RGB-D keypoints by casting the problem as a fixed-lag smoothing task with a constant-twist motion model and an active shape model for shape variability. It contributes CAST$^\star$, a certifiably optimal solver achieved via a small SDP relaxation of a QCQP, and CAST$^\#$, an outlier-robust wrapper combining compatibility pruning with Graduated Non-Convexity. The method demonstrates empirical tightness of the relaxation, robustness to substantial outliers, and competitive accuracy across synthetic data, public datasets (YCBInEOAT, NOCS), and a real drone-tracking scenario. The work advances interpretable, provably optimal category-level tracking with practical robustness to measurement noise and outliers, enabling reliable Perception-Enhanced robotics in dynamic environments.

Abstract

Applications from manipulation to autonomous vehicles rely on robust and general object tracking to safely perform tasks in dynamic environments. We propose the first certifiably optimal category-level approach for simultaneous shape estimation and pose tracking of an object of known category (e.g. a car). Our approach uses 3D semantic keypoint measurements extracted from an RGB-D image sequence, and phrases the estimation as a fixed-lag smoothing problem. Temporal constraints enforce the object's rigidity (fixed shape) and smooth motion according to a constant-twist motion model. The solutions to this problem are the estimates of the object's state (poses, velocities) and shape (paramaterized according to the active shape model) over the smoothing horizon. Our key contribution is to show that despite the non-convexity of the fixed-lag smoothing problem, we can solve it to certifiable optimality using a small-size semidefinite relaxation. We also present a fast outlier rejection scheme that filters out incorrect keypoint detections with shape and time compatibility tests, and wrap our certifiable solver in a graduated non-convexity scheme. We evaluate the proposed approach on synthetic and real data, showcasing its performance in a table-top manipulation scenario and a drone-based vehicle tracking application.

A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking

TL;DR

The paper addresses category-level shape estimation and pose tracking from RGB-D keypoints by casting the problem as a fixed-lag smoothing task with a constant-twist motion model and an active shape model for shape variability. It contributes CAST, a certifiably optimal solver achieved via a small SDP relaxation of a QCQP, and CAST, an outlier-robust wrapper combining compatibility pruning with Graduated Non-Convexity. The method demonstrates empirical tightness of the relaxation, robustness to substantial outliers, and competitive accuracy across synthetic data, public datasets (YCBInEOAT, NOCS), and a real drone-tracking scenario. The work advances interpretable, provably optimal category-level tracking with practical robustness to measurement noise and outliers, enabling reliable Perception-Enhanced robotics in dynamic environments.

Abstract

Applications from manipulation to autonomous vehicles rely on robust and general object tracking to safely perform tasks in dynamic environments. We propose the first certifiably optimal category-level approach for simultaneous shape estimation and pose tracking of an object of known category (e.g. a car). Our approach uses 3D semantic keypoint measurements extracted from an RGB-D image sequence, and phrases the estimation as a fixed-lag smoothing problem. Temporal constraints enforce the object's rigidity (fixed shape) and smooth motion according to a constant-twist motion model. The solutions to this problem are the estimates of the object's state (poses, velocities) and shape (paramaterized according to the active shape model) over the smoothing horizon. Our key contribution is to show that despite the non-convexity of the fixed-lag smoothing problem, we can solve it to certifiable optimality using a small-size semidefinite relaxation. We also present a fast outlier rejection scheme that filters out incorrect keypoint detections with shape and time compatibility tests, and wrap our certifiable solver in a graduated non-convexity scheme. We evaluate the proposed approach on synthetic and real data, showcasing its performance in a table-top manipulation scenario and a drone-based vehicle tracking application.

Paper Structure

This paper contains 21 sections, 6 theorems, 31 equations, 6 figures, 5 tables.

Key Result

Proposition 1

For any positions and rotations $(\mathbf{p}_t, \mathbf{R}_t)$, the optimal shape coefficient solving eq:optprob_original is where we defined the following symbols:

Figures (6)

  • Figure 1: Overview of CAST#. We estimate the shape and track the pose of an object from a sequence of images picturing the object. Given 3D keypoint measurements obtained via a learning-based detector, we formulate a non-convex fixed-lag smoothing problem where the shape is parametrized using an active shape model and motion smoothness is enforced using a constant-twist motion model. We solve this problem via a tight and small-size semidefinite relaxation and wrap the method in an outlier rejection scheme to robustly estimate shape and pose over a fixed time horizon.
  • Figure 2: Active Shape Model. Known 3D models in the bottle category and their averages computed according to the active shape model. Vertices are the original models and edges are the average of the two vertices. The active shape model can represent any 3D geometry in the convex hull of its shape library through a point-wise weighted average.
  • Figure 3: Outlier Pruning. Most outliers are easy to identify via shape or time compatibility tests. Shape compatibility retains keypoints that are mutually within the convex hull of the known shape library. Time compatibility compares keypoint pairs over multiple observations and retains groups that preserve 3D distance over time, up to a tolerance $\epsilon$. We determine the largest set of compatible measurements via a mixed integer linear program.
  • Figure 4: Performance of CAST$^\star$ and CAST# in Synthetic Experiments. Using the PASCAL3D+ aeroplane shape library, we generate synthetic measurements to test the robustness of CAST$^\star$ and CAST# to measurement noise, process noise, and outliers. Plots show median and IQR of 500 runs.
  • Figure 5: Performance of CAST$^\star$ in synthetic experiments with increasing measurement noise. Robustness to measurement noise with CAST$^\star$ using the inverse of the simulated velocity covariance for the velocity weights $\omega_t$. The key difference between this plot and fig:synthetic_results(a) lies in the suboptimality gap figure, where CAST$^\star$ loses tightness quickly. Despite losing its optimality certificate, CAST$^\star$ maintains the lowest position, rotation, and shape errors.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 1: Optimal Shape
  • proof
  • Proposition 2: QCQP Formulation
  • proof
  • Corollary 3: Shor's Relaxation
  • Proposition 4: Shape Compatibility Test
  • Proposition 5: Time Compatibility Test
  • Proposition 6: Largest Set of Compatible Measurements