Table of Contents
Fetching ...

SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios

Amr Gomaa, Robin Zitt, Guillermo Reyes, Antonio Krüger

TL;DR

The paper addresses the data bottleneck for dynamic hand gesture recognition in driving contexts by introducing SynthoGestures, an Unreal Engine-based framework that synthesizes diverse, dynamic gestures across camera modalities (RGB, depth, infrared) and viewpoints. It combines configurable gesture variations (speed, hand shape, lighting) with both description-based and animation-based generation modes, guided by a spline and an IK-based Control Rig for realistic motion. Experimental results show that augmenting real data with synthetic data improves recognition performance when trained from scratch, particularly with substantial synthetic variation, suggesting a practical path to faster development of automotive gesture recognition systems. Overall, SynthoGestures offers a cost-effective, flexible tool to accelerate gesture data generation and model training beyond automotive to other HCI domains.

Abstract

Creating a diverse and comprehensive dataset of hand gestures for dynamic human-machine interfaces in the automotive domain can be challenging and time-consuming. To overcome this challenge, we propose using synthetic gesture datasets generated by virtual 3D models. Our framework utilizes Unreal Engine to synthesize realistic hand gestures, offering customization options and reducing the risk of overfitting. Multiple variants, including gesture speed, performance, and hand shape, are generated to improve generalizability. In addition, we simulate different camera locations and types, such as RGB, infrared, and depth cameras, without incurring additional time and cost to obtain these cameras. Experimental results demonstrate that our proposed framework, SynthoGestures (https://github.com/amrgomaaelhady/SynthoGestures), improves gesture recognition accuracy and can replace or augment real-hand datasets. By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.

SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios

TL;DR

The paper addresses the data bottleneck for dynamic hand gesture recognition in driving contexts by introducing SynthoGestures, an Unreal Engine-based framework that synthesizes diverse, dynamic gestures across camera modalities (RGB, depth, infrared) and viewpoints. It combines configurable gesture variations (speed, hand shape, lighting) with both description-based and animation-based generation modes, guided by a spline and an IK-based Control Rig for realistic motion. Experimental results show that augmenting real data with synthetic data improves recognition performance when trained from scratch, particularly with substantial synthetic variation, suggesting a practical path to faster development of automotive gesture recognition systems. Overall, SynthoGestures offers a cost-effective, flexible tool to accelerate gesture data generation and model training beyond automotive to other HCI domains.

Abstract

Creating a diverse and comprehensive dataset of hand gestures for dynamic human-machine interfaces in the automotive domain can be challenging and time-consuming. To overcome this challenge, we propose using synthetic gesture datasets generated by virtual 3D models. Our framework utilizes Unreal Engine to synthesize realistic hand gestures, offering customization options and reducing the risk of overfitting. Multiple variants, including gesture speed, performance, and hand shape, are generated to improve generalizability. In addition, we simulate different camera locations and types, such as RGB, infrared, and depth cameras, without incurring additional time and cost to obtain these cameras. Experimental results demonstrate that our proposed framework, SynthoGestures (https://github.com/amrgomaaelhady/SynthoGestures), improves gesture recognition accuracy and can replace or augment real-hand datasets. By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
Paper Structure (8 sections, 4 figures, 1 table)

This paper contains 8 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: We present, a new 3D synthetic hand gesture generation framework, SynthoGestures, that provides a cost-effective and flexible approach for creating new variations of dynamic and static hand gestures. Our framework combines 3D modeling with a game engine (i.e., Unreal Engine) to produce multiple datasets with different camera positions (e.g., infotainment perspective, top view, and behind the wheel) and different camera types (e.g., RGB, infrared, and depth camera) with different noise modeling techniques.
  • Figure 2: Overview of our SynthoGestures framework for dynamic hand gesture synthesis. It takes initial settings by user input or reads them from a JSON file. This includes generic recording settings such as saving file path, number of recordings per gesture, video resolution, and default hand to perform gestures. Next, the framework allows for the customization of gestures, including camera types, locations, hand shapes and positions, gesture speed, and lighting conditions. This enables the generation of diverse variations in performance for dynamic hand gestures. Additionally, it allows the user to identify only general settings and automatically loops over all possible gesture-specific settings to produce a comprehensive data set with multiple variations.
  • Figure 3: An example of the character model with the spline of a rotation gesture (right) with the generated gesture sequence as a low frame rate depth image without noise (left).
  • Figure 4: Gesture recognition accuracy for different combination of synthetic and real hand gestures. The prefix "Pre" are the models pre-trained with the synthetic data, while "Mixed" are models trained from scratch with the combined synthetic and real data. The first number is the percentage of gesture variations (i.e., data size) for synthetically generated hand gestures, while the second number is the same for the real data.