GMatch: A Lightweight, Geometry-Constrained Keypoint Matcher for Zero-Shot 6DoF Pose Estimation in Robotic Grasp Tasks
Ming Yang, Haoran Li
TL;DR
GMatch tackles the challenge of zero-shot 6DoF pose estimation on resource-constrained robots by revisiting keypoint matching and introducing a geometry-constrained incremental matcher. It formulates correspondence as a branch-and-bound search that enforces geometric completeness via pairwise distances and scalar triple products, augmented with an opacity constraint to prevent flip-overs, and operates with a tunable feature-distance threshold $\epsilon_f$ and geometry tolerance $\epsilon_c$. Across HOPE and YCB-Video, GMatch coupled with SIFT achieves competitive accuracy, outperforming several feature-based and registration baselines and approaching state-of-the-art zero-shot methods on texture-rich objects, while running efficiently on CPU-only hardware. A real-world LoCoBot grasp demonstration validates its practicality, illustrating a lightweight, white-box solution that remains flexible to descriptor choice and potentially scalable with improved descriptors. The work highlights a practical direction for robust yet efficient pose estimation in embedded robotic systems, with future work targeting descriptor quality and additional geometric constraints to broaden robustness.
Abstract
6DoF object pose estimation is fundamental to robotic grasp tasks. While recent learning-based methods achieve high accuracy, their computational demands hinder deployment on resource-constrained mobile platforms. In this work, we revisit the classical keypoint matching paradigm and propose GMatch, a lightweight, geometry-constrained keypoint matcher that can run efficiently on embedded CPU-only platforms. GMatch works with keypoint descriptors and it uses a set of geometric constraints to establishes inherent ambiguities between features extracted by descriptors, thus giving a globally consistent correspondences from which 6DoF pose can be easily solved. We benchmark GMatch on the HOPE and YCB-Video datasets, where our method beats existing keypoint matchers (both feature-based and geometry-based) among three commonly used descriptors and approaches the SOTA zero-shot method on texture-rich objects with much more humble devices. The method is further deployed on a LoCoBot mobile manipulator, enabling a one-shot grasp pipeline that demonstrates high task success rates in real-world experiments. In a word, by its lightweight and white-box nature, GMatch offers a practical solution for resource-limited robotic systems, and although currently bottlenecked by descriptor quality, the framework presents a promising direction towards robust yet efficient pose estimation. Code will be released soon under Mozilla Public License.
