Physics-aware Hand-object Interaction Denoising
Haowen Luo, Yunze Liu, Li Yi
TL;DR
The paper tackles the problem of maintaining physical plausibility in hand-object interaction sequences under heavy occlusion by introducing a physically-aware de-noising framework. It combines a dual HOI representation with a de-noising auto-encoder and two differentiable neural losses, $L_{grasp}$ and $L_{manip}$, to enforce grasp credibility and manipulation feasibility, trained in a two-stage regime. The approach demonstrates improvements in both pose accuracy and physical plausibility on synthetic-noise and real-tracker errors across GRAB and HO-3D, highlighting robustness to object variation and noise patterns. This work advances practical hand-object tracking for VR/AR and robotics by enabling end-to-end differentiable optimization that respects physical constraints.
Abstract
The credibility and practicality of a reconstructed hand-object interaction sequence depend largely on its physical plausibility. However, due to high occlusions during hand-object interaction, physical plausibility remains a challenging criterion for purely vision-based tracking methods. To address this issue and enhance the results of existing hand trackers, this paper proposes a novel physically-aware hand motion de-noising method. Specifically, we introduce two learned loss terms that explicitly capture two crucial aspects of physical plausibility: grasp credibility and manipulation feasibility. These terms are used to train a physically-aware de-noising network. Qualitative and quantitative experiments demonstrate that our approach significantly improves both fine-grained physical plausibility and overall pose accuracy, surpassing current state-of-the-art de-noising methods.
