Table of Contents
Fetching ...

A Helping (Human) Hand in Kinematic Structure Estimation

Adrian Pfisterer, Xing Li, Vito Mengers, Oliver Brock

TL;DR

This work tackles the problem of reliably estimating kinematic models for articulated objects under visual uncertainty, especially during occlusions and texture-poor scenes. It introduces a probabilistic, real-time framework that uses the human hand as a perceptual prior and decomposes the estimation into landmark motion, hand-body motion, and kinematic-model inference, all guarded by uncertainty-aware filters. The approach achieves high accuracy on a new benchmark with challenging, small-articulation objects and enables direct robotic manipulation with real-time performance, outperforming two strong baselines by substantial margins and maintaining low variance. The results demonstrate the practical value of leveraging hand priors to improve perception-driven manipulation in uncontrolled environments, with clear paths for extending to contact-phase detection and viewpoint optimization.

Abstract

Visual uncertainties such as occlusions, lack of texture, and noise present significant challenges in obtaining accurate kinematic models for safe robotic manipulation. We introduce a probabilistic real-time approach that leverages the human hand as a prior to mitigate these uncertainties. By tracking the constrained motion of the human hand during manipulation and explicitly modeling uncertainties in visual observations, our method reliably estimates an object's kinematic model online. We validate our approach on a novel dataset featuring challenging objects that are occluded during manipulation and offer limited articulations for perception. The results demonstrate that by incorporating an appropriate prior and explicitly accounting for uncertainties, our method produces accurate estimates, outperforming two recent baselines by 195% and 140%, respectively. Furthermore, we demonstrate that our approach's estimates are precise enough to allow a robot to manipulate even small objects safely.

A Helping (Human) Hand in Kinematic Structure Estimation

TL;DR

This work tackles the problem of reliably estimating kinematic models for articulated objects under visual uncertainty, especially during occlusions and texture-poor scenes. It introduces a probabilistic, real-time framework that uses the human hand as a perceptual prior and decomposes the estimation into landmark motion, hand-body motion, and kinematic-model inference, all guarded by uncertainty-aware filters. The approach achieves high accuracy on a new benchmark with challenging, small-articulation objects and enables direct robotic manipulation with real-time performance, outperforming two strong baselines by substantial margins and maintaining low variance. The results demonstrate the practical value of leveraging hand priors to improve perception-driven manipulation in uncontrolled environments, with clear paths for extending to contact-phase detection and viewpoint optimization.

Abstract

Visual uncertainties such as occlusions, lack of texture, and noise present significant challenges in obtaining accurate kinematic models for safe robotic manipulation. We introduce a probabilistic real-time approach that leverages the human hand as a prior to mitigate these uncertainties. By tracking the constrained motion of the human hand during manipulation and explicitly modeling uncertainties in visual observations, our method reliably estimates an object's kinematic model online. We validate our approach on a novel dataset featuring challenging objects that are occluded during manipulation and offer limited articulations for perception. The results demonstrate that by incorporating an appropriate prior and explicitly accounting for uncertainties, our method produces accurate estimates, outperforming two recent baselines by 195% and 140%, respectively. Furthermore, we demonstrate that our approach's estimates are precise enough to allow a robot to manipulate even small objects safely.

Paper Structure

This paper contains 21 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our approach estimates kinematic models from human hand motions for challenging objects where tracking object-part motions is difficult due to occlusions and a lack of texture (left). By deliberately modeling uncertainties based on hand properties, our approach weights observations and rejects outliers resulting from uncertainties and noisy measurements (center), significantly improving the estimation results (right).
  • Figure 2: We decompose the estimation problem into three hierarchical levels, similar to the OMIP system martin-martinCoupledRecursiveEstimation2022. This hierarchical structure allows us to model uncertainties and reject outliers at two different levels, each grounded in physical priors related to hand properties.
  • Figure 3: Our method accurately estimates kinematic models of various challenging articulated objects from human hand motion in real-time.
  • Figure 4: Compared to the two baseline methods, our approach achieves significantly higher accuracy and less variance on the evaluation dataset. Additionally, the results show that our uncertainty models further improve the accuracy of the estimations.
  • Figure 5: The baseline methods struggle to generate accurate estimations due to uncertainties in visual observations. (a): When manipulating the tension belt, partial occlusion of the hand makes it challenging to detect the hand pose directly from images, which affects the estimation accuracy of Bahety et al. bahety2024screwmimic. (b): Similarly, occlusions and the limited range of observed articulations permitted by the valve result in a short, noisy landmark trajectory. Combined with the lack of orientation data this leads to a poorly estimated rotation axis in the method of Regal et al. regal2023usingSingle.
  • ...and 2 more figures