Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

Tianshu Wu; Jiyao Zhang; Shiqian Liang; Zhengxiao Han; Hao Dong

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

Tianshu Wu, Jiyao Zhang, Shiqian Liang, Zhengxiao Han, Hao Dong

TL;DR

This work tackles the challenge of online, marker-free end-effector pose estimation with cross-robot generalization and no training. It introduces FEEPE, which leverages foundation-model features to establish 2D-3D correspondences from pre-rendered CAD templates and estimates an initial $SE(3)$ pose via PnP, followed by a multi-historical key-frame optimization that uses temporal information and robot priors to resolve symmetry and partial-observation ambiguities. The approach is validated on synthetic and real datasets, outperforming both CAD-model-based and learning-based robot pose estimation methods, and enabling accurate online calibration and grasping without markers. Limitations include the need for an end-effector CAD model and real-time segmentation, with future work aiming at mask-free operation and broader applicability.

Abstract

Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from the CAD model and target image, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multi-historical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

TL;DR

Abstract

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)