Table of Contents
Fetching ...

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

Tianshu Wu, Jiyao Zhang, Shiqian Liang, Zhengxiao Han, Hao Dong

TL;DR

This work tackles the challenge of online, marker-free end-effector pose estimation with cross-robot generalization and no training. It introduces FEEPE, which leverages foundation-model features to establish 2D-3D correspondences from pre-rendered CAD templates and estimates an initial $SE(3)$ pose via PnP, followed by a multi-historical key-frame optimization that uses temporal information and robot priors to resolve symmetry and partial-observation ambiguities. The approach is validated on synthetic and real datasets, outperforming both CAD-model-based and learning-based robot pose estimation methods, and enabling accurate online calibration and grasping without markers. Limitations include the need for an end-effector CAD model and real-time segmentation, with future work aiming at mask-free operation and broader applicability.

Abstract

Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from the CAD model and target image, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multi-historical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

TL;DR

This work tackles the challenge of online, marker-free end-effector pose estimation with cross-robot generalization and no training. It introduces FEEPE, which leverages foundation-model features to establish 2D-3D correspondences from pre-rendered CAD templates and estimates an initial pose via PnP, followed by a multi-historical key-frame optimization that uses temporal information and robot priors to resolve symmetry and partial-observation ambiguities. The approach is validated on synthetic and real datasets, outperforming both CAD-model-based and learning-based robot pose estimation methods, and enabling accurate online calibration and grasping without markers. Limitations include the need for an end-effector CAD model and real-time segmentation, with future work aiming at mask-free operation and broader applicability.

Abstract

Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from the CAD model and target image, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multi-historical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: We propose FEEPE (Foundation feature-driven End-Effector Pose Estimation), an online, marker-free, training-free method for pose estimation that generalizes across robots and end-effectors.
  • Figure 2: Overview of FEEPE. Given the 3D model of the end-effector and a target image, we first render multi-view templates. Using foundation features, we find the top $K_r$ references most similar to the target image and compute 2D-3D matches and pose candidates (Section \ref{['sec:2d-3d_matching']}). To address ambiguities from partial observations, we introduce a global memory pool (Section \ref{['sec:memory_pool']}) that records keyframes and robot states for pose optimization (Section \ref{['sec:pose_optimization']}). To resolve ambiguities from symmetry, we propose a symmetry disambiguation module (Section \ref{['sec:symmetry_disambiguation']}) to eliminate incorrect matches.
  • Figure 3: Averages accuracy curves of different methods. MegaPose† represents MegaPose-RGB labbe2022megapose with multi-hypothesis and ICP, and MegaPose corresponds to MegaPose-RGBD labbe2022megapose.
  • Figure 4: Features visualization from various Dinov2 layers.
  • Figure 5: Results of high-precision targeting experiments.