GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit
TL;DR
GigaPose addresses the need for fast, robust CAD-based coarse 6D pose estimation of novel objects from RGB images by decoupling the pose into out-of-plane rotation captured with 162 templates and the remaining four DoFs recovered from patch correspondences. A ViT-based ${\bf F}_{\text{ae}}$ learns local features via local-contrastive training to match templates to input segments, while ${\bf F}_{\text{ist}}$ and two lightweight MLPs predict 2D scale $s$, in-plane rotation $\alpha$, and 2D translation from 2D–2D matches; ${\bf M}_{t \rightarrow q}$ is refined with a RANSAC loop. On seven core BOP datasets, GigaPose achieves state-of-the-art accuracy and is significantly faster—about a $35\times$ speedup over MegaPose for coarse estimation—while exhibiting enhanced robustness to segmentation errors. The method remains compatible with refinement networks and can leverage 3D models predicted from a single image (Wonder3D), reducing the CAD-model burden and making real-time 6D pose estimation of novel objects more practical for industrial deployment.
Abstract
We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest-neighbor search in feature space, results in a speedup factor of 35x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose
