Table of Contents
Fetching ...

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

Hao Xu, Haipeng Li, Yinqiao Wang, Shuaicheng Liu, Chi-Wing Fu

TL;DR

This work presents HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generative space on hand-object interactions and purposely sampling the space to synthesize effective data samples.

Abstract

Reconstructing 3D hand mesh robustly from a single image is very challenging, due to the lack of diversity in existing real-world datasets. While data synthesis helps relieve the issue, the syn-to-real gap still hinders its usage. In this work, we present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generative space on hand-object interactions and purposely sampling the space to synthesize effective data samples. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds; favorably, accurate 3D annotations are obtained for free. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set. Equipped with our method, several baselines can be significantly improved beyond the SOTA on the HO3D and DexYCB benchmarks. Our code will be released on https://github.com/hxwork/HandBooster_Pytorch.

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

TL;DR

This work presents HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generative space on hand-object interactions and purposely sampling the space to synthesize effective data samples.

Abstract

Reconstructing 3D hand mesh robustly from a single image is very challenging, due to the lack of diversity in existing real-world datasets. While data synthesis helps relieve the issue, the syn-to-real gap still hinders its usage. In this work, we present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generative space on hand-object interactions and purposely sampling the space to synthesize effective data samples. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds; favorably, accurate 3D annotations are obtained for free. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set. Equipped with our method, several baselines can be significantly improved beyond the SOTA on the HO3D and DexYCB benchmarks. Our code will be released on https://github.com/hxwork/HandBooster_Pytorch.
Paper Structure (30 sections, 6 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: HandBooster significantly improves several baselines, making them SOTA again on both HO3D and DexYCB.
  • Figure 2: Our HandBooster framework. (i) The Novel Condition Creator prepares and samples diverse and novel grasping poses against real-world and synthetic distributions to create conditions. (ii) The Hand-Object Image Synthesizer follows the conditions to generate image samples. (iii) The synthesized samples can then be employed to effectively train different types of reconstruction models.
  • Figure 3: Condition candidates. Normal and texture maps are more informative than others; see the red/yellow bounding boxes.
  • Figure 4: Visualizations in sampling. For three object categories in HO3D, we show the similarity of sampled real-world poses with our FPS or random sampling (left) and the histograms of similarity between sampled real-world and synthetic poses (right).
  • Figure 5: The mesh AUC comparison under different thresholds. All baselines are improved consistently after applying our data.
  • ...and 2 more figures