Table of Contents
Fetching ...

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering

Qijun Gan, Wentong Li, Jinwei Ren, Jianke Zhu

TL;DR

The paper tackles fine-grained multi-view hand reconstruction by integrating a GCN-driven MANO parameter estimator with a Hand Albedo and Mesh (HAM) optimization and a mesh-based neural renderer. This coarse-to-fine pipeline preserves mesh topology while recovering detailed geometry and textures, aided by inverse rendering and a pre-trained neural renderer with dataset-specific fine-tuning. Experiments on InterHand2.6M, DeepHandMesh, and a high-resolution self-collected dataset show improvements in both 3D reconstruction accuracy and rendering quality, with ablations confirming the contributions of HAM and joint refinement. The approach offers an efficient, topology-consistent solution for realistic hand synthesis and animation, with public code and data to support reproducibility.

Abstract

Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering

TL;DR

The paper tackles fine-grained multi-view hand reconstruction by integrating a GCN-driven MANO parameter estimator with a Hand Albedo and Mesh (HAM) optimization and a mesh-based neural renderer. This coarse-to-fine pipeline preserves mesh topology while recovering detailed geometry and textures, aided by inverse rendering and a pre-trained neural renderer with dataset-specific fine-tuning. Experiments on InterHand2.6M, DeepHandMesh, and a high-resolution self-collected dataset show improvements in both 3D reconstruction accuracy and rendering quality, with ablations confirming the contributions of HAM and joint refinement. The approach offers an efficient, topology-consistent solution for realistic hand synthesis and animation, with public code and data to support reproducibility.

Abstract

Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.
Paper Structure (15 sections, 17 equations, 8 figures, 3 tables)

This paper contains 15 sections, 17 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our proposed approach focuses on reconstructing hands from multi-view images, allowing for the generation of precise poses, geometry, and photo-realistic rendering.
  • Figure 2: Overview of our coarse-to-fine framework. Given a set of calibrated images, we initialize MANO parameters and refine the mesh using our proposed HAM module and inverse rendering to achieve geometric details. By jointly optimizing the mesh using a model-based neural rendering, a fine-grained mesh can be obtained along with its hyper-realistic rendered images.
  • Figure 3: Our GCN-based network. The four-layer GCN progressively doubles the number of vertices and MANO head outputs the corresponding MANO parameters.
  • Figure 4: Qualitative performance comparison. We show the rendering results of single hand (first two rows) and dual hands (last four rows), which are optimized and trained from 10-view images. The hands rendered with pure white color represent the shading in order to highlight the level of mesh detail.
  • Figure 5: Comparison on mesh quality. The generated meshes are compared in terms of geometric quality using 5 different views on the DeepHandMesh Moon_2020_ECCV_DeepHandMesh dataset. JR represents the Joint Refinement.
  • ...and 3 more figures