Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework
Akash Chaudhary, Tiago Nascimento, Martin Saska
TL;DR
The paper tackles intuitive gesture-based UAV control by fusing stereo RGB-D sensing with a three-module pipeline: 3D pose estimation from 2D cues, a lightweight yet robust action classifier built around an autoencoder–DTW–kNN framework, and UAV control that maintains the operator in view while executing gestures. It contributes a specialized 3D pose estimation from 2D poses, a carefully crafted feature space with single/joint/pair/tri-joint embeddings, and a scalable, real-time classification scheme validated on public datasets and in real UAV deployments. Results show the Encoded configuration achieves strong accuracy (86–97%) with substantially reduced computation time compared with Heavy, outperforming several baselines while enabling real-time operation in field conditions. The approach demonstrates practical impact for real-time, low-power HRI with UAVs, enabling natural and reliable gesture-based collaboration in dynamic environments.
Abstract
Harnessing human movements to command an Unmanned Aerial Vehicle (UAV) holds the potential to revolutionize their deployment, rendering it more intuitive and user-centric. In this research, we introduce a novel methodology adept at classifying three-dimensional human actions, leveraging them to coordinate on-field with a UAV. Utilizing a stereo camera, we derive both RGB and depth data, subsequently extracting three-dimensional human poses from the continuous video feed. This data is then processed through our proposed k-nearest neighbour classifier, the results of which dictate the behaviour of the UAV. It also includes mechanisms ensuring the robot perpetually maintains the human within its visual purview, adeptly tracking user movements. We subjected our approach to rigorous testing involving multiple tests with real robots. The ensuing results, coupled with comprehensive analysis, underscore the efficacy and inherent advantages of our proposed methodology.
