Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
Qijun Gan, Wentong Li, Jinwei Ren, Jianke Zhu
TL;DR
The paper tackles fine-grained multi-view hand reconstruction by integrating a GCN-driven MANO parameter estimator with a Hand Albedo and Mesh (HAM) optimization and a mesh-based neural renderer. This coarse-to-fine pipeline preserves mesh topology while recovering detailed geometry and textures, aided by inverse rendering and a pre-trained neural renderer with dataset-specific fine-tuning. Experiments on InterHand2.6M, DeepHandMesh, and a high-resolution self-collected dataset show improvements in both 3D reconstruction accuracy and rendering quality, with ablations confirming the contributions of HAM and joint refinement. The approach offers an efficient, topology-consistent solution for realistic hand synthesis and animation, with public code and data to support reproducibility.
Abstract
Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.
