ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking
Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He
TL;DR
ZeroBP tackles zero-shot 6D pose estimation in bin-picking by learning Position-Aware Correspondence (PAC) between scene instances and CAD models. It introduces a global positional encoding based on multiplicative directional vectors and a bidirectional Position-Aware Cross-Attention (PACA) to fuse global position with local features, enabling robust correspondence in textureless, ambiguously shaped parts. A coarse-to-fine strategy aligns superpoints and dense points, with an initial pose estimated from local features and iterative pose-global-position refinement via weighted SVD. Trained on a large synthetic dataset and evaluated on the ROBI real-world dataset, ZeroBP significantly improves average recall of correct poses over state-of-the-art zero-shot methods and shows competitive performance against object-specific approaches, promising faster deployment with strong generalization. The approach integrates mathematical constructs such as $R \,\in\; SO(3)$ and $t \,\in\; \mathbb{R}^3$ within an alternating optimization loop, yielding practical impact for bin-picking robotics and beyond.
Abstract
Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.
