Novel Object 6D Pose Estimation with a Single Reference View
Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Hossein Rahmani, Ajmal Mian, Lin Wang
TL;DR
This work tackles novel object 6D pose estimation using only a single reference view, addressing the scalability limitations of CAD-model and dense-reference approaches. It introduces SinRef-6D, which performs iterative object-space point-wise alignment guided by RGB and Points State Space Models (SSMs) to capture long-range spatial information with linear complexity, followed by pose solving via weighted SVD. Key contributions include the integration of RGB and Points SSMs, an iterative focalization-and-alignment pipeline with two non-shared GeoTransformers, and strong empirical results across six public datasets and real-world scenes, demonstrating CAD-free performance competitive with CAD-based methods. The method offers practical impact for mobile and robotic deployments by removing the need for textured CAD models or dense reference views, enabling scalable 6D pose estimation in unseen objects. Limitations include challenges with top-down views and reflective materials, pointing to future work on robustness in such scenarios.
Abstract
Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in a common coordinate system based on state space models (SSMs). Specifically, iterative object-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.
