GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation
Ivan Bilić, Filip Marić, Fabio Bonsignorio, Ivan Petrović
TL;DR
The paper introduces GISR, a real-time framework that jointly estimates the 6D camera-to-robot pose and the robot configuration from a single RGB image. It combines a geometric initialization module that leverages a differentiable EDM-based pipeline with a silhouette-based refinement module that iteratively updates pose and configuration using a fast silhouette renderer. Training optimizes both configuration and pose losses, while the RM learns to correct initialization-driven errors, achieving a reported runtime around $40$ ms and superior speed-accuracy compared with dense RGB methods. Experiments on Panda-3Cam show strong generalization and competitive performance against state-of-the-art methods, including those requiring ground-truth proprioception. The approach highlights the benefits of integrating geometry priors with efficient silhouette-based refinement for online, dynamic robotics scenarios, and points toward extension to unknown robot kinematics.
Abstract
In autonomous robotics, measurement of the robot's internal state and perception of its environment, including interaction with other agents such as collaborative robots, are essential. Estimating the pose of the robot arm from a single view has the potential to replace classical eye-to-hand calibration approaches and is particularly attractive for online estimation and dynamic environments. In addition to its pose, recovering the robot configuration provides a complete spatial understanding of the observed robot that can be used to anticipate the actions of other agents in advanced robotics use cases. Furthermore, this additional redundancy enables the planning and execution of recovery protocols in case of sensor failures or external disturbances. We introduce GISR - a deep configuration and robot-to-camera pose estimation method that prioritizes execution in real-time. GISR consists of two modules: (i) a geometric initialization module that efficiently computes an approximate robot pose and configuration, and (ii) a deep iterative silhouette-based refinement module that arrives at a final solution in just a few iterations. We evaluate GISR on publicly available data and show that it outperforms existing methods of the same class in terms of both speed and accuracy, and can compete with approaches that rely on ground-truth proprioception and recover only the pose.
