Table of Contents
Fetching ...

MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

TL;DR

Experiments demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.

Abstract

Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed for real-time multi-view single hand reconstruction. MLP Hand consists of two primary modules: (1) a lightweight MLP-based Skeleton2Mesh model that efficiently recovers hand meshes from hand skeletons, and (2) a multi-view geometry feature fusion prediction module that enhances the Skeleton2Mesh model with detailed geometric information from multiple views. Experiments on three widely used datasets demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.

MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

TL;DR

Experiments demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.

Abstract

Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed for real-time multi-view single hand reconstruction. MLP Hand consists of two primary modules: (1) a lightweight MLP-based Skeleton2Mesh model that efficiently recovers hand meshes from hand skeletons, and (2) a multi-view geometry feature fusion prediction module that enhances the Skeleton2Mesh model with detailed geometric information from multiple views. Experiments on three widely used datasets demonstrate that MLPHand can reduce computational complexity by 90% while achieving comparable reconstruction accuracy to existing state-of-the-art baselines.
Paper Structure (28 sections, 9 equations, 9 figures, 8 tables)

This paper contains 28 sections, 9 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: The Overview of MLPHand and its training strategy. MLPHand consists of: (${\bf a}$) multi-view hand skeleton estimator and (${\bf b}$) multi-view geometry feature fusion prediction module. (${\bf c}$) displays two training stage of our method.
  • Figure 2: The order encoding module (a) and (b) offset regression module both belong to the Skeleton2Mesh model. (c) displays the skeleton-based convex decomposition of non-convex hand mesh (right) and bone's order (left). We place the detailed decomposition process in supplemental material. (d) displays a toy example of cube transformation. (e) displays the bone-wise self-rotation and translation from the template to the current skeleton-aligned shape.
  • Figure 3: (a) depicts the alignment and concatenation operations applied to a given 3D point during the cross-view geometry feature fusion process (3 views for simplicity). (b) displays the feature forward-propagation process in MGFP module, during the reconstruction process of the k-th bone's local mesh.
  • Figure 4: The qualitative display of the ablation of GSD module. All examples come from the FreiHand test set. The left half presents the frontal view, while the right half displays the side view. In each half, moving from left to right, the initial column showcases outcomes without GSD, the middle column exhibits results with GSD, and the final column portrays the ground-truth meshes.
  • Figure 5: The qualitative display of the DexYCB-MV dataset.
  • ...and 4 more figures