Table of Contents
Fetching ...

M3DHMR: Monocular 3D Hand Mesh Recovery

Yihong Lin, Xianjia Wu, Xilai Wang, Jianqiao Hu, Songju Lei, Xiandong Li, Wenxiong Kang

TL;DR

<2 sentences> Monocular 3D hand mesh recovery faces high degrees of freedom and 2D-3D ambiguity, complicating real-time reconstruction. The authors introduce M3DHMR, which uses a two-stack hourglass to extract 2D cues and a Dynamic Spiral Convolution (DSC) spiral decoder plus a Region of Interest (ROI) Layer to directly regress camera-space hand mesh vertices, achieving state-of-the-art real-time performance on FreiHAND. Through extensive ablations, they show that adaptive per-vertex weighting and region-based refinement substantially boost accuracy, while maintaining efficiency. The work offers a practical pipeline for accurate, real-time hand mesh recovery with potential extensions via generative priors to address depth ambiguity.

Abstract

Monocular 3D hand mesh recovery is challenging due to high degrees of freedom of hands, 2D-to-3D ambiguity and self-occlusion. Most existing methods are either inefficient or less straightforward for predicting the position of 3D mesh vertices. Thus, we propose a new pipeline called Monocular 3D Hand Mesh Recovery (M3DHMR) to directly estimate the positions of hand mesh vertices. M3DHMR provides 2D cues for 3D tasks from a single image and uses a new spiral decoder consist of several Dynamic Spiral Convolution (DSC) Layers and a Region of Interest (ROI) Layer. On the one hand, DSC Layers adaptively adjust the weights based on the vertex positions and extract the vertex features in both spatial and channel dimensions. On the other hand, ROI Layer utilizes the physical information and refines mesh vertices in each predefined hand region separately. Extensive experiments on popular dataset FreiHAND demonstrate that M3DHMR significantly outperforms state-of-the-art real-time methods.

M3DHMR: Monocular 3D Hand Mesh Recovery

TL;DR

<2 sentences> Monocular 3D hand mesh recovery faces high degrees of freedom and 2D-3D ambiguity, complicating real-time reconstruction. The authors introduce M3DHMR, which uses a two-stack hourglass to extract 2D cues and a Dynamic Spiral Convolution (DSC) spiral decoder plus a Region of Interest (ROI) Layer to directly regress camera-space hand mesh vertices, achieving state-of-the-art real-time performance on FreiHAND. Through extensive ablations, they show that adaptive per-vertex weighting and region-based refinement substantially boost accuracy, while maintaining efficiency. The work offers a practical pipeline for accurate, real-time hand mesh recovery with potential extensions via generative priors to address depth ambiguity.

Abstract

Monocular 3D hand mesh recovery is challenging due to high degrees of freedom of hands, 2D-to-3D ambiguity and self-occlusion. Most existing methods are either inefficient or less straightforward for predicting the position of 3D mesh vertices. Thus, we propose a new pipeline called Monocular 3D Hand Mesh Recovery (M3DHMR) to directly estimate the positions of hand mesh vertices. M3DHMR provides 2D cues for 3D tasks from a single image and uses a new spiral decoder consist of several Dynamic Spiral Convolution (DSC) Layers and a Region of Interest (ROI) Layer. On the one hand, DSC Layers adaptively adjust the weights based on the vertex positions and extract the vertex features in both spatial and channel dimensions. On the other hand, ROI Layer utilizes the physical information and refines mesh vertices in each predefined hand region separately. Extensive experiments on popular dataset FreiHAND demonstrate that M3DHMR significantly outperforms state-of-the-art real-time methods.

Paper Structure

This paper contains 19 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Qualitative results of our M3DHMR. We show the 2D pose, mesh projection, camera space mesh and pose in metres. The red rectangle indicates the camera.
  • Figure 2: Overview of our M3DHMR framework. The architecture of ROI Layer is a dilated spiral convolution. Different colors in Dynamic Spiral Convolution indicate different weights corresponding to the spiral convolution kernels.
  • Figure 3: The preset regions for the ROI Layer. Different colors denote different regions. Dilated spiral convolution in ROI layer focuses only on the interior of the region.
  • Figure 4: Several qualitative results of our predicted 2D pose, front view mesh and side view mesh on the test set of FreiHAND.
  • Figure 5: Qualitative comparisons of M3DHMR, CMR and MobRecon for predicting front view mesh on the test set of FreiHAND. The shortcomings of the latter two methods are marked with red circles.