Table of Contents
Fetching ...

STMR: Spiral Transformer for Hand Mesh Reconstruction

Huilong Xie, Wenwei Song, Wenxiong Kang, Yihong Lin

TL;DR

STMR tackles monocular hand mesh reconstruction by integrating spiral neighbor sampling into a Transformer framework to explicitly leverage mesh topology while maintaining efficiency with a single image encoder. It introduces MSPFE to extract rich pose features across scales and PPVL to map pose features to a compact, MANO-informed vertex representation. Reconstruction is performed by a Spiral Transformer-based 3D decoder that propagates topology-aware information through spiral neighbor relations. Extensive experiments on FreiHAND show state-of-the-art performance with fast inference, and ablations validate the benefits of MSPFE, PPVL, and the SW-MSA decoding strategy. Overall, the approach delivers a topology-aware, efficient solution for accurate hand mesh recovery from single-view images.

Abstract

Recent advancements in both transformer-based methods and spiral neighbor sampling techniques have greatly enhanced hand mesh reconstruction. Transformers excel in capturing complex vertex relationships, and spiral neighbor sampling is vital for utilizing topological structures. This paper ingeniously integrates spiral sampling into the Transformer architecture, enhancing its ability to leverage mesh topology for superior performance in hand mesh reconstruction, resulting in substantial accuracy boosts. STMR employs a single image encoder for model efficiency. To augment its information extraction capability, we design the multi-scale pose feature extraction (MSPFE) module, which facilitates the extraction of rich pose features, ultimately enhancing the model's performance. Moreover, the proposed predefined pose-to-vertex lifting (PPVL) method improves vertex feature representation, further boosting reconstruction performance. Extensive experiments on the FreiHAND dataset demonstrate the state-of-the-art performance and unparalleled inference speed of STMR compared with similar backbone methods, showcasing its efficiency and effectiveness. The code is available at https://github.com/SmallXieGithub/STMR.

STMR: Spiral Transformer for Hand Mesh Reconstruction

TL;DR

STMR tackles monocular hand mesh reconstruction by integrating spiral neighbor sampling into a Transformer framework to explicitly leverage mesh topology while maintaining efficiency with a single image encoder. It introduces MSPFE to extract rich pose features across scales and PPVL to map pose features to a compact, MANO-informed vertex representation. Reconstruction is performed by a Spiral Transformer-based 3D decoder that propagates topology-aware information through spiral neighbor relations. Extensive experiments on FreiHAND show state-of-the-art performance with fast inference, and ablations validate the benefits of MSPFE, PPVL, and the SW-MSA decoding strategy. Overall, the approach delivers a topology-aware, efficient solution for accurate hand mesh recovery from single-view images.

Abstract

Recent advancements in both transformer-based methods and spiral neighbor sampling techniques have greatly enhanced hand mesh reconstruction. Transformers excel in capturing complex vertex relationships, and spiral neighbor sampling is vital for utilizing topological structures. This paper ingeniously integrates spiral sampling into the Transformer architecture, enhancing its ability to leverage mesh topology for superior performance in hand mesh reconstruction, resulting in substantial accuracy boosts. STMR employs a single image encoder for model efficiency. To augment its information extraction capability, we design the multi-scale pose feature extraction (MSPFE) module, which facilitates the extraction of rich pose features, ultimately enhancing the model's performance. Moreover, the proposed predefined pose-to-vertex lifting (PPVL) method improves vertex feature representation, further boosting reconstruction performance. Extensive experiments on the FreiHAND dataset demonstrate the state-of-the-art performance and unparalleled inference speed of STMR compared with similar backbone methods, showcasing its efficiency and effectiveness. The code is available at https://github.com/SmallXieGithub/STMR.
Paper Structure (31 sections, 9 equations, 8 figures, 3 tables)

This paper contains 31 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Accuracy vs. inference speed on the FreiHAND test set. The proposed method outperforms competing techniques with a similar-scale visual encoder, demonstrating superior speed and performance. Best viewed in color.
  • Figure 2: Overview of our STMR framework. Best viewed in color.
  • Figure 3: Details of MSPFE module. Here, we use three feature maps at different scales as an example. Best viewed in color.
  • Figure 4: Details of PPVL method. Best viewed in color.
  • Figure 5: Details of Spiral Transformer Block. Best viewed in color.
  • ...and 3 more figures