A Helping (Human) Hand in Kinematic Structure Estimation
Adrian Pfisterer, Xing Li, Vito Mengers, Oliver Brock
TL;DR
This work tackles the problem of reliably estimating kinematic models for articulated objects under visual uncertainty, especially during occlusions and texture-poor scenes. It introduces a probabilistic, real-time framework that uses the human hand as a perceptual prior and decomposes the estimation into landmark motion, hand-body motion, and kinematic-model inference, all guarded by uncertainty-aware filters. The approach achieves high accuracy on a new benchmark with challenging, small-articulation objects and enables direct robotic manipulation with real-time performance, outperforming two strong baselines by substantial margins and maintaining low variance. The results demonstrate the practical value of leveraging hand priors to improve perception-driven manipulation in uncontrolled environments, with clear paths for extending to contact-phase detection and viewpoint optimization.
Abstract
Visual uncertainties such as occlusions, lack of texture, and noise present significant challenges in obtaining accurate kinematic models for safe robotic manipulation. We introduce a probabilistic real-time approach that leverages the human hand as a prior to mitigate these uncertainties. By tracking the constrained motion of the human hand during manipulation and explicitly modeling uncertainties in visual observations, our method reliably estimates an object's kinematic model online. We validate our approach on a novel dataset featuring challenging objects that are occluded during manipulation and offer limited articulations for perception. The results demonstrate that by incorporating an appropriate prior and explicitly accounting for uncertainties, our method produces accurate estimates, outperforming two recent baselines by 195% and 140%, respectively. Furthermore, we demonstrate that our approach's estimates are precise enough to allow a robot to manipulate even small objects safely.
