Table of Contents
Fetching ...

HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images

Shuaibing Wang, Shunli Wang, Dingkang Yang, Mingcheng Li, Ziyun Qian, Liuzhen Su, Lihua Zhang

TL;DR

A novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features and reaches state-of-the-art performance.

Abstract

We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features. Specifically, we designed the Knowledge-Guided Graph Convolution (KGC) module and the Cross-Attention Transformer (CAT) module. KGC extracts hand prior information from 2D hand pose by graph convolution. CAT fuses hand prior into occluded regions by considering their high correlation. Extensive experiments on popular datasets with challenging hand-object occlusions, such as HO3D v2, HO3D v3, and DexYCB demonstrate that our HandGCAT reaches state-of-the-art performance. The code is available at https://github.com/heartStrive/HandGCAT.

HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images

TL;DR

A novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features and reaches state-of-the-art performance.

Abstract

We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features. Specifically, we designed the Knowledge-Guided Graph Convolution (KGC) module and the Cross-Attention Transformer (CAT) module. KGC extracts hand prior information from 2D hand pose by graph convolution. CAT fuses hand prior into occluded regions by considering their high correlation. Extensive experiments on popular datasets with challenging hand-object occlusions, such as HO3D v2, HO3D v3, and DexYCB demonstrate that our HandGCAT reaches state-of-the-art performance. The code is available at https://github.com/heartStrive/HandGCAT.
Paper Structure (18 sections, 5 equations, 4 figures, 5 tables)

This paper contains 18 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The main idea of the proposed HandGCAT is to exploit the hand prior knowledge to imagine occluded regions.
  • Figure 2: Overview of the proposed HandGCAT for 3D hand mesh reconstruction, which consists of backbone, KGC, CAT, and regressor. Resnet-50 with FPN extracts image feature $F_I$. KGC captures hand prior knowledge $F_P$ using GCNs from the 2D pose. CAT fuses $F_P$ into $F_I$ and thus imagines occluded regions. Finally, the regressor reconstructs the 3D hand mesh.
  • Figure 3: Cross-Attention Transformer Block.
  • Figure 4: Qualitative comparison of the proposed HandGCAT and state-of-the-art method handoccnet on HO3D v2.