HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricker, Jason Rambach, Yu Zhang
TL;DR
HiPose presents a real-time RGB-D 6DoF object pose estimator that removes the need for refinement by learning dense 3D-3D correspondences through a hierarchical binary surface encoding. The method uses a coarse-to-fine, RANSAC-free pipeline with hierarchical correspondence pruning, leveraging a bidirectional CNN-RandLANet fusion backbone and a Kabsch solver to progressively refine pose and exclude outliers. Across LM-O, YCB-V, and T-LESS, HiPose achieves state-of-the-art or competitive accuracy without rendering-based refinement, while being approximately 40x faster than refinement-based counterparts. Trained primarily on synthetic data, the approach demonstrates robustness to depth noise and occlusion, making it suitable for real-time, depth-enabled robotics and AR applications.
Abstract
In this work, we present a novel dense-correspondence method for 6DoF object pose estimation from a single RGB-D image. While many existing data-driven methods achieve impressive performance, they tend to be time-consuming due to their reliance on rendering-based refinement approaches. To circumvent this limitation, we present HiPose, which establishes 3D-3D correspondences in a coarse-to-fine manner with a hierarchical binary surface encoding. Unlike previous dense-correspondence methods, we estimate the correspondence surface by employing point-to-surface matching and iteratively constricting the surface until it becomes a correspondence point while gradually removing outliers. Extensive experiments on public benchmarks LM-O, YCB-V, and T-Less demonstrate that our method surpasses all refinement-free methods and is even on par with expensive refinement-based approaches. Crucially, our approach is computationally efficient and enables real-time critical applications with high accuracy requirements.
