Table of Contents
Fetching ...

OmniRace: 6D Hand Pose Estimation for Intuitive Guidance of Racing Drone

Valerii Serpiva, Aleksey Fedoseev, Sausar Karaf, Ali Alridha Abdulkarim, Dzmitry Tsetserukou

Abstract

This paper presents the OmniRace approach to controlling a racing drone with 6-degree of freedom (DoF) hand pose estimation and gesture recognition. To our knowledge, it is the first-ever technology that allows for low-level control of high-speed drones using gestures. OmniRace employs a gesture interface based on computer vision and a deep neural network to estimate a 6-DoF hand pose. The advanced machine learning algorithm robustly interprets human gestures, allowing users to control drone motion intuitively. Real-time control of a racing drone demonstrates the effectiveness of the system, validating its potential to revolutionize drone racing and other applications. Experimental results conducted in the Gazebo simulation environment revealed that OmniRace allows the users to complite the UAV race track significantly (by 25.1%) faster and to decrease the length of the test drone path (from 102.9 to 83.7 m). Users preferred the gesture interface for attractiveness (1.57 UEQ score), hedonic quality (1.56 UEQ score), and lower perceived temporal demand (32.0 score in NASA-TLX), while noting the high efficiency (0.75 UEQ score) and low physical demand (19.0 score in NASA-TLX) of the baseline remote controller. The deep neural network attains an average accuracy of 99.75% when applied to both normalized datasets and raw datasets. OmniRace can potentially change the way humans interact with and navigate racing drones in dynamic and complex environments. The source code is available at https://github.com/SerValera/OmniRace.git.

OmniRace: 6D Hand Pose Estimation for Intuitive Guidance of Racing Drone

Abstract

This paper presents the OmniRace approach to controlling a racing drone with 6-degree of freedom (DoF) hand pose estimation and gesture recognition. To our knowledge, it is the first-ever technology that allows for low-level control of high-speed drones using gestures. OmniRace employs a gesture interface based on computer vision and a deep neural network to estimate a 6-DoF hand pose. The advanced machine learning algorithm robustly interprets human gestures, allowing users to control drone motion intuitively. Real-time control of a racing drone demonstrates the effectiveness of the system, validating its potential to revolutionize drone racing and other applications. Experimental results conducted in the Gazebo simulation environment revealed that OmniRace allows the users to complite the UAV race track significantly (by 25.1%) faster and to decrease the length of the test drone path (from 102.9 to 83.7 m). Users preferred the gesture interface for attractiveness (1.57 UEQ score), hedonic quality (1.56 UEQ score), and lower perceived temporal demand (32.0 score in NASA-TLX), while noting the high efficiency (0.75 UEQ score) and low physical demand (19.0 score in NASA-TLX) of the baseline remote controller. The deep neural network attains an average accuracy of 99.75% when applied to both normalized datasets and raw datasets. OmniRace can potentially change the way humans interact with and navigate racing drones in dynamic and complex environments. The source code is available at https://github.com/SerValera/OmniRace.git.
Paper Structure (13 sections, 6 equations, 9 figures, 2 tables)

This paper contains 13 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (a) Visual estimation of human hand pose in 6-DoF and gesture recognition. (b) Experimental setup for evaluating the gesture control interface. (c) Drone race track with 10 gates in a simulation environment used for evaluating the control interfaces.
  • Figure 2: Algorithm pipeline for 6-DoF gesture-based drone race control.
  • Figure 3: Structure of the DNN training model: 42 input parameters (x, y coordinates of one-hand landmarks), a hidden layer with 168 and 546 parameters, and an output layer with 8 parameters (representing gestures).
  • Figure 4: Graphs depicting accuracy (a) and loss (b) functions across multiple epochs. The yellow line represents validation data, while the blue line represents training data during the network learning process.
  • Figure 5: (a) Detection of 21 landmarks (x, y coordinates) of right-hand joint points. (b) Depth image of the hand with identified landmarks. (c) 3D hand pose and orientations calculated using ABCM points from color and depth images. (d) Transformed angles for drone orientation control.
  • ...and 4 more figures