M3DHMR: Monocular 3D Hand Mesh Recovery
Yihong Lin, Xianjia Wu, Xilai Wang, Jianqiao Hu, Songju Lei, Xiandong Li, Wenxiong Kang
TL;DR
<2 sentences> Monocular 3D hand mesh recovery faces high degrees of freedom and 2D-3D ambiguity, complicating real-time reconstruction. The authors introduce M3DHMR, which uses a two-stack hourglass to extract 2D cues and a Dynamic Spiral Convolution (DSC) spiral decoder plus a Region of Interest (ROI) Layer to directly regress camera-space hand mesh vertices, achieving state-of-the-art real-time performance on FreiHAND. Through extensive ablations, they show that adaptive per-vertex weighting and region-based refinement substantially boost accuracy, while maintaining efficiency. The work offers a practical pipeline for accurate, real-time hand mesh recovery with potential extensions via generative priors to address depth ambiguity.
Abstract
Monocular 3D hand mesh recovery is challenging due to high degrees of freedom of hands, 2D-to-3D ambiguity and self-occlusion. Most existing methods are either inefficient or less straightforward for predicting the position of 3D mesh vertices. Thus, we propose a new pipeline called Monocular 3D Hand Mesh Recovery (M3DHMR) to directly estimate the positions of hand mesh vertices. M3DHMR provides 2D cues for 3D tasks from a single image and uses a new spiral decoder consist of several Dynamic Spiral Convolution (DSC) Layers and a Region of Interest (ROI) Layer. On the one hand, DSC Layers adaptively adjust the weights based on the vertex positions and extract the vertex features in both spatial and channel dimensions. On the other hand, ROI Layer utilizes the physical information and refines mesh vertices in each predefined hand region separately. Extensive experiments on popular dataset FreiHAND demonstrate that M3DHMR significantly outperforms state-of-the-art real-time methods.
