Real-time Holistic Robot Pose Estimation with Unknown States
Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang
TL;DR
This work tackles monocular holistic robot pose estimation when internal joint states are unknown. It introduces a modular, end-to-end framework composed of DepthNet, JointNet, RotationNet, and KeypointNet that predicts camera-to-robot rotation, joint states, root depth, and root-relative 3D keypoints, all in a single feed-forward pass. DepthNet disentangles depth from camera intrinsics, KeypointNet provides pixel-aligned 3D keypoints, and differentiable forward kinematics fuse these estimates into accurate 3D poses; self-supervision further enhances sim-to-real generalization. The approach achieves state-of-the-art accuracy while delivering a $12\times$ speedup over iterative Render-and-Compare methods, enabling real-time holistic robot pose estimation for diverse morphologies and real-world scenarios.
Abstract
Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. While previous methods have achieved promising performance, most of them presume full knowledge of robot internal states, e.g. ground-truth robot joint angles. However, this assumption is not always valid in practical situations. In real-world applications such as multi-robot collaboration or human-robot interaction, the robot joint states might not be shared or could be unreliable. On the other hand, existing approaches that estimate robot pose without joint state priors suffer from heavy computation burdens and thus cannot support real-time applications. This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states. Our method estimates camera-to-robot rotation, robot state parameters, keypoint locations, and root depth, employing a neural network module for each task to facilitate learning and sim-to-real transfer. Notably, it achieves inference in a single feed-forward pass without iterative optimization. Our approach offers a 12-time speed increase with state-of-the-art accuracy, enabling real-time holistic robot pose estimation for the first time. Code and models are available at https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation.
