Table of Contents
Fetching ...

Hand-object reconstruction via interaction-aware graph attention mechanism

Taeyun Woo, Tae-Kyun Kim, Jinah Park

TL;DR

Addressing the challenge of jointly estimating hand and object poses with physically plausible interaction, the paper proposes an interaction-aware graph attention mechanism. It builds four interconnected graphs using common relation edges $E_c$ and attention-guided edges $E_a$ to enable cross-class message passing across hand-hand, hand-object, object-hand, and object-object node pairs. A refinement stage with four GC blocks predicts vertex displacements to update hand and object meshes, guided by losses for both initial estimation and refinement. Experiments on ObMan and DexYCB demonstrate improved physical plausibility, particularly in reducing maximum penetration and intersection volume, with ablations confirming the value of both edge types. The approach advances realistic hand-object interactions for VR/AR and robotics applications.

Abstract

Estimating the poses of both a hand and an object has become an important area of research due to the growing need for advanced vision computing. The primary challenge involves understanding and reconstructing how hands and objects interact, such as contact and physical plausibility. Existing approaches often adopt a graph neural network to incorporate spatial information of hand and object meshes. However, these approaches have not fully exploited the potential of graphs without modification of edges within and between hand- and object-graphs. We propose a graph-based refinement method that incorporates an interaction-aware graph-attention mechanism to account for hand-object interactions. Using edges, we establish connections among closely correlated nodes, both within individual graphs and across different graphs. Experiments demonstrate the effectiveness of our proposed method with notable improvements in the realm of physical plausibility.

Hand-object reconstruction via interaction-aware graph attention mechanism

TL;DR

Addressing the challenge of jointly estimating hand and object poses with physically plausible interaction, the paper proposes an interaction-aware graph attention mechanism. It builds four interconnected graphs using common relation edges and attention-guided edges to enable cross-class message passing across hand-hand, hand-object, object-hand, and object-object node pairs. A refinement stage with four GC blocks predicts vertex displacements to update hand and object meshes, guided by losses for both initial estimation and refinement. Experiments on ObMan and DexYCB demonstrate improved physical plausibility, particularly in reducing maximum penetration and intersection volume, with ablations confirming the value of both edge types. The approach advances realistic hand-object interactions for VR/AR and robotics applications.

Abstract

Estimating the poses of both a hand and an object has become an important area of research due to the growing need for advanced vision computing. The primary challenge involves understanding and reconstructing how hands and objects interact, such as contact and physical plausibility. Existing approaches often adopt a graph neural network to incorporate spatial information of hand and object meshes. However, these approaches have not fully exploited the potential of graphs without modification of edges within and between hand- and object-graphs. We propose a graph-based refinement method that incorporates an interaction-aware graph-attention mechanism to account for hand-object interactions. Using edges, we establish connections among closely correlated nodes, both within individual graphs and across different graphs. Experiments demonstrate the effectiveness of our proposed method with notable improvements in the realm of physical plausibility.
Paper Structure (12 sections, 2 equations, 5 figures, 3 tables)

This paper contains 12 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the proposed method.
  • Figure 2: Design of Graph convolution (GC) blocks.
  • Figure 3: Qualitative results of ObMan hasson (1-2 rows) and DexYCB dexycb (3-4 rows).
  • Figure 4: Effect of our refinement stage. Red-colored volume represents the intersection volume.
  • Figure 5: Qualitative comparisons of ablation study.