A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking
Lorenzo Shaikewitz, Samuel Ubellacker, Luca Carlone
TL;DR
The paper addresses category-level shape estimation and pose tracking from RGB-D keypoints by casting the problem as a fixed-lag smoothing task with a constant-twist motion model and an active shape model for shape variability. It contributes CAST$^\star$, a certifiably optimal solver achieved via a small SDP relaxation of a QCQP, and CAST$^\#$, an outlier-robust wrapper combining compatibility pruning with Graduated Non-Convexity. The method demonstrates empirical tightness of the relaxation, robustness to substantial outliers, and competitive accuracy across synthetic data, public datasets (YCBInEOAT, NOCS), and a real drone-tracking scenario. The work advances interpretable, provably optimal category-level tracking with practical robustness to measurement noise and outliers, enabling reliable Perception-Enhanced robotics in dynamic environments.
Abstract
Applications from manipulation to autonomous vehicles rely on robust and general object tracking to safely perform tasks in dynamic environments. We propose the first certifiably optimal category-level approach for simultaneous shape estimation and pose tracking of an object of known category (e.g. a car). Our approach uses 3D semantic keypoint measurements extracted from an RGB-D image sequence, and phrases the estimation as a fixed-lag smoothing problem. Temporal constraints enforce the object's rigidity (fixed shape) and smooth motion according to a constant-twist motion model. The solutions to this problem are the estimates of the object's state (poses, velocities) and shape (paramaterized according to the active shape model) over the smoothing horizon. Our key contribution is to show that despite the non-convexity of the fixed-lag smoothing problem, we can solve it to certifiable optimality using a small-size semidefinite relaxation. We also present a fast outlier rejection scheme that filters out incorrect keypoint detections with shape and time compatibility tests, and wrap our certifiable solver in a graduated non-convexity scheme. We evaluate the proposed approach on synthetic and real data, showcasing its performance in a table-top manipulation scenario and a drone-based vehicle tracking application.
