Table of Contents
Fetching ...

Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework

Akash Chaudhary, Tiago Nascimento, Martin Saska

TL;DR

The paper tackles intuitive gesture-based UAV control by fusing stereo RGB-D sensing with a three-module pipeline: 3D pose estimation from 2D cues, a lightweight yet robust action classifier built around an autoencoder–DTW–kNN framework, and UAV control that maintains the operator in view while executing gestures. It contributes a specialized 3D pose estimation from 2D poses, a carefully crafted feature space with single/joint/pair/tri-joint embeddings, and a scalable, real-time classification scheme validated on public datasets and in real UAV deployments. Results show the Encoded configuration achieves strong accuracy (86–97%) with substantially reduced computation time compared with Heavy, outperforming several baselines while enabling real-time operation in field conditions. The approach demonstrates practical impact for real-time, low-power HRI with UAVs, enabling natural and reliable gesture-based collaboration in dynamic environments.

Abstract

Harnessing human movements to command an Unmanned Aerial Vehicle (UAV) holds the potential to revolutionize their deployment, rendering it more intuitive and user-centric. In this research, we introduce a novel methodology adept at classifying three-dimensional human actions, leveraging them to coordinate on-field with a UAV. Utilizing a stereo camera, we derive both RGB and depth data, subsequently extracting three-dimensional human poses from the continuous video feed. This data is then processed through our proposed k-nearest neighbour classifier, the results of which dictate the behaviour of the UAV. It also includes mechanisms ensuring the robot perpetually maintains the human within its visual purview, adeptly tracking user movements. We subjected our approach to rigorous testing involving multiple tests with real robots. The ensuing results, coupled with comprehensive analysis, underscore the efficacy and inherent advantages of our proposed methodology.

Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework

TL;DR

The paper tackles intuitive gesture-based UAV control by fusing stereo RGB-D sensing with a three-module pipeline: 3D pose estimation from 2D cues, a lightweight yet robust action classifier built around an autoencoder–DTW–kNN framework, and UAV control that maintains the operator in view while executing gestures. It contributes a specialized 3D pose estimation from 2D poses, a carefully crafted feature space with single/joint/pair/tri-joint embeddings, and a scalable, real-time classification scheme validated on public datasets and in real UAV deployments. Results show the Encoded configuration achieves strong accuracy (86–97%) with substantially reduced computation time compared with Heavy, outperforming several baselines while enabling real-time operation in field conditions. The approach demonstrates practical impact for real-time, low-power HRI with UAVs, enabling natural and reliable gesture-based collaboration in dynamic environments.

Abstract

Harnessing human movements to command an Unmanned Aerial Vehicle (UAV) holds the potential to revolutionize their deployment, rendering it more intuitive and user-centric. In this research, we introduce a novel methodology adept at classifying three-dimensional human actions, leveraging them to coordinate on-field with a UAV. Utilizing a stereo camera, we derive both RGB and depth data, subsequently extracting three-dimensional human poses from the continuous video feed. This data is then processed through our proposed k-nearest neighbour classifier, the results of which dictate the behaviour of the UAV. It also includes mechanisms ensuring the robot perpetually maintains the human within its visual purview, adeptly tracking user movements. We subjected our approach to rigorous testing involving multiple tests with real robots. The ensuing results, coupled with comprehensive analysis, underscore the efficacy and inherent advantages of our proposed methodology.
Paper Structure (29 sections, 9 equations, 5 figures, 2 tables)

This paper contains 29 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: A Group of UAVs controlled by a human operator in an open field.
  • Figure 2: Flowchart depicting the action classifier and its use in UAV control, with the green blocks depicting our contribution.
  • Figure 3: Confusion Matrix of Encoded Variant with 6 Gesture Classes.
  • Figure 4: Confusion Matrix of Encoded Variant with 7 Gesture Classes.
  • Figure 5: A UAV being controlled by a human operator in an open field.