Table of Contents
Fetching ...

Physics-aware Hand-object Interaction Denoising

Haowen Luo, Yunze Liu, Li Yi

TL;DR

The paper tackles the problem of maintaining physical plausibility in hand-object interaction sequences under heavy occlusion by introducing a physically-aware de-noising framework. It combines a dual HOI representation with a de-noising auto-encoder and two differentiable neural losses, $L_{grasp}$ and $L_{manip}$, to enforce grasp credibility and manipulation feasibility, trained in a two-stage regime. The approach demonstrates improvements in both pose accuracy and physical plausibility on synthetic-noise and real-tracker errors across GRAB and HO-3D, highlighting robustness to object variation and noise patterns. This work advances practical hand-object tracking for VR/AR and robotics by enabling end-to-end differentiable optimization that respects physical constraints.

Abstract

The credibility and practicality of a reconstructed hand-object interaction sequence depend largely on its physical plausibility. However, due to high occlusions during hand-object interaction, physical plausibility remains a challenging criterion for purely vision-based tracking methods. To address this issue and enhance the results of existing hand trackers, this paper proposes a novel physically-aware hand motion de-noising method. Specifically, we introduce two learned loss terms that explicitly capture two crucial aspects of physical plausibility: grasp credibility and manipulation feasibility. These terms are used to train a physically-aware de-noising network. Qualitative and quantitative experiments demonstrate that our approach significantly improves both fine-grained physical plausibility and overall pose accuracy, surpassing current state-of-the-art de-noising methods.

Physics-aware Hand-object Interaction Denoising

TL;DR

The paper tackles the problem of maintaining physical plausibility in hand-object interaction sequences under heavy occlusion by introducing a physically-aware de-noising framework. It combines a dual HOI representation with a de-noising auto-encoder and two differentiable neural losses, and , to enforce grasp credibility and manipulation feasibility, trained in a two-stage regime. The approach demonstrates improvements in both pose accuracy and physical plausibility on synthetic-noise and real-tracker errors across GRAB and HO-3D, highlighting robustness to object variation and noise patterns. This work advances practical hand-object tracking for VR/AR and robotics by enabling end-to-end differentiable optimization that respects physical constraints.

Abstract

The credibility and practicality of a reconstructed hand-object interaction sequence depend largely on its physical plausibility. However, due to high occlusions during hand-object interaction, physical plausibility remains a challenging criterion for purely vision-based tracking methods. To address this issue and enhance the results of existing hand trackers, this paper proposes a novel physically-aware hand motion de-noising method. Specifically, we introduce two learned loss terms that explicitly capture two crucial aspects of physical plausibility: grasp credibility and manipulation feasibility. These terms are used to train a physically-aware de-noising network. Qualitative and quantitative experiments demonstrate that our approach significantly improves both fine-grained physical plausibility and overall pose accuracy, surpassing current state-of-the-art de-noising methods.
Paper Structure (19 sections, 10 equations, 5 figures, 4 tables)

This paper contains 19 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Given a noisy hand-object interaction sequence, our method produces de-noised hand poses conditioning on the object trajectory, mitigating physically-implausible artifacts such as erroneous contact and hand-object penetration. In this example of a human hand (yellow) manipulating a hand model (grey), the de-noised result demonstrates higher physical plausibility. Please see our supplementary material for more annimated results.
  • Figure 2: Overview of the training and inference frameworks.
  • Figure 3: The proposed grasp credibility loss and manipulation feasibility loss can help quantify the physical plausibility of hand pose estimation results in a smooth way. The darkness of hand colors indicates the value of our proposed neural losses on frames with different noise levels. In this example, as the noise level of hand poses decreases, our proposed neural losses also decrease gradually, despite the discrete nature of the hand-object interaction (contact vs non-contact), providing smooth guidance for training of physics-aware de-noising network.
  • Figure 4: Qualitative results on GRAB dataset. We can see that TOCH produces physically implausible results such as hand-object penetration when dealing with thin and delicate parts of objects, while our results are more realistic.
  • Figure 5: Qualitative results on HO-3D dataset. Our method effectively denoises tracking result, and produces more physically plausible hand-object interaction than TOCH.