Table of Contents
Fetching ...

VR-based generation of photorealistic synthetic data for training hand-object tracking models

Chengyan Zhang, Rahul Chaudhari

TL;DR

This work introduces blender-hoisynth, an interactive, VR-enabled data generator built on Blender/UPBGE for photorealistic hand-object interaction data. It enables humans to manipulate objects with virtual hands in real time and exports rich ground-truth annotations, yielding the SynthDexYCB dataset. Through experiments with the gSDF hand-object reconstruction model, the authors demonstrate that replacing parts of real DexYCB data with synthetic data does not significantly degrade performance, supporting the practicality of synthetic data for HOI tasks. The approach promises scalable, controllable HOI data generation with broad applicability in 3D reconstruction, pose estimation, and tracking, while outlining avenues for future improvements like full finger control and haptics.

Abstract

Supervised learning models for precise tracking of hand-object interactions (HOI) in 3D require large amounts of annotated data for training. Moreover, it is not intuitive for non-experts to label 3D ground truth (e.g. 6DoF object pose) on 2D images. To address these issues, we present "blender-hoisynth", an interactive synthetic data generator based on the Blender software. Blender-hoisynth can scalably generate and automatically annotate visual HOI training data. Other competing approaches usually generate synthetic HOI data compeletely without human input. While this may be beneficial in some scenarios, HOI applications inherently necessitate direct control over the HOIs as an expression of human intent. With blender-hoisynth, it is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware. The synthetically generated data are characterized by a high degree of photorealism and contain visually plausible and physically realistic videos of hands grasping objects and moving them around in 3D. To demonstrate the efficacy of our data generation, we replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it. We show that there is no significant degradation in the model performance despite the data replacement.

VR-based generation of photorealistic synthetic data for training hand-object tracking models

TL;DR

This work introduces blender-hoisynth, an interactive, VR-enabled data generator built on Blender/UPBGE for photorealistic hand-object interaction data. It enables humans to manipulate objects with virtual hands in real time and exports rich ground-truth annotations, yielding the SynthDexYCB dataset. Through experiments with the gSDF hand-object reconstruction model, the authors demonstrate that replacing parts of real DexYCB data with synthetic data does not significantly degrade performance, supporting the practicality of synthetic data for HOI tasks. The approach promises scalable, controllable HOI data generation with broad applicability in 3D reconstruction, pose estimation, and tracking, while outlining avenues for future improvements like full finger control and haptics.

Abstract

Supervised learning models for precise tracking of hand-object interactions (HOI) in 3D require large amounts of annotated data for training. Moreover, it is not intuitive for non-experts to label 3D ground truth (e.g. 6DoF object pose) on 2D images. To address these issues, we present "blender-hoisynth", an interactive synthetic data generator based on the Blender software. Blender-hoisynth can scalably generate and automatically annotate visual HOI training data. Other competing approaches usually generate synthetic HOI data compeletely without human input. While this may be beneficial in some scenarios, HOI applications inherently necessitate direct control over the HOIs as an expression of human intent. With blender-hoisynth, it is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware. The synthetically generated data are characterized by a high degree of photorealism and contain visually plausible and physically realistic videos of hands grasping objects and moving them around in 3D. To demonstrate the efficacy of our data generation, we replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it. We show that there is no significant degradation in the model performance despite the data replacement.
Paper Structure (10 sections, 4 figures, 1 table)

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of blender-hoisynth and its usage.Fast iteration of HOI tracking model development can be achieved by customizing the data generator as required (dashed errors) in a feedback loop.
  • Figure 2: Virtual hand mesh showing the hierarchical bone rig and the half-cylinders collision sensors.
  • Figure 3: This figure shows that the ground truth parameter distributions of the synthetic data (bottom row) are qualitatively similar to those of the real data. Parameters shown are distance of the object from the camera optical center, object azimuths and elevations in the camera coordinate frame, and visibility fraction of the objects.
  • Figure :