Table of Contents
Fetching ...

ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He

TL;DR

ZeroBP tackles zero-shot 6D pose estimation in bin-picking by learning Position-Aware Correspondence (PAC) between scene instances and CAD models. It introduces a global positional encoding based on multiplicative directional vectors and a bidirectional Position-Aware Cross-Attention (PACA) to fuse global position with local features, enabling robust correspondence in textureless, ambiguously shaped parts. A coarse-to-fine strategy aligns superpoints and dense points, with an initial pose estimated from local features and iterative pose-global-position refinement via weighted SVD. Trained on a large synthetic dataset and evaluated on the ROBI real-world dataset, ZeroBP significantly improves average recall of correct poses over state-of-the-art zero-shot methods and shows competitive performance against object-specific approaches, promising faster deployment with strong generalization. The approach integrates mathematical constructs such as $R \,\in\; SO(3)$ and $t \,\in\; \mathbb{R}^3$ within an alternating optimization loop, yielding practical impact for bin-picking robotics and beyond.

Abstract

Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.

ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

TL;DR

ZeroBP tackles zero-shot 6D pose estimation in bin-picking by learning Position-Aware Correspondence (PAC) between scene instances and CAD models. It introduces a global positional encoding based on multiplicative directional vectors and a bidirectional Position-Aware Cross-Attention (PACA) to fuse global position with local features, enabling robust correspondence in textureless, ambiguously shaped parts. A coarse-to-fine strategy aligns superpoints and dense points, with an initial pose estimated from local features and iterative pose-global-position refinement via weighted SVD. Trained on a large synthetic dataset and evaluated on the ROBI real-world dataset, ZeroBP significantly improves average recall of correct poses over state-of-the-art zero-shot methods and shows competitive performance against object-specific approaches, promising faster deployment with strong generalization. The approach integrates mathematical constructs such as and within an alternating optimization loop, yielding practical impact for bin-picking robotics and beyond.

Abstract

Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.

Paper Structure

This paper contains 18 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) The brief overview of ZeroBP. Given the CAD model of an unseen object, the model can be generalized to the real-world bin-picking scene for 6D pose estimation without requiring retraining. (b) The existing local feature matching for 6D pose estimation. (c) The proposed learning position-aware correspondence for 6D pose estimation. "GPEnc" refers to Globally Positional Encoding, and "PA Cross-Attention" refers to Position-Aware Cross-Attention.
  • Figure 2: Overview of learning Position-Aware Correspondence (PAC) for zero-shot 6D pose estimation in bin-picking. We introduce the globally positional encoding and position-aware cross-attention modules to learn robust PAC, alleviating the mismatch issue in textureless bin-picking workpieces. Overall, we adopt a coarse-to-fine strategy to establish the correspondence from superpoints to points.
  • Figure 3: Visualizations of the 6D pose estimation results on the real-world bin-picking dataset ROBI robi.
  • Figure 4: Visualizations of correspondences in layers.