UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing
Sousannah Abdalla, Sabur Baidya
TL;DR
The paper tackles the challenge of intuitive UAV control through vision-based hand gestures by comparing traditional cropping/segmentation methods with a landmark-based recognition approach. It integrates MediaPipe hand landmarks with a custom CNN classifier and further enhances detection range by incorporating YOLOv4, achieving up to 10 m distance, and leverages edge computing to reduce processing latency. The system is validated in AirSim and on a real DJI Tello drone, demonstrating 96.14% overall gesture accuracy, robust performance up to 5 m, and end-to-end latency around 150 ms, with reliable trajectory execution (92% path accuracy) under varied wind and lighting conditions. These results indicate that edge-assisted landmark-based gesture recognition provides a practical, real-time interface for UAV control in dynamic environments, enabling broader, more robust remote operation.
Abstract
Gesture recognition presents a promising avenue for interfacing with unmanned aerial vehicles (UAVs) due to its intuitive nature and potential for precise interaction. This research conducts a comprehensive comparative analysis of vision-based hand gesture detection methodologies tailored for UAV Control. The existing gesture recognition approaches involving cropping, zooming, and color-based segmentation, do not work well for this kind of applications in dynamic conditions and suffer in performance with increasing distance and environmental noises. We propose to use a novel approach leveraging hand landmarks drawing and classification for gesture recognition based UAV control. With experimental results we show that our proposed method outperforms the other existing methods in terms of accuracy, noise resilience, and efficacy across varying distances, thus providing robust control decisions. However, implementing the deep learning based compute intensive gesture recognition algorithms on the UAV's onboard computer is significantly challenging in terms of performance. Hence, we propose to use a edge-computing based framework to offload the heavier computing tasks, thus achieving closed-loop real-time performance. With implementation over AirSim simulator as well as over a real-world UAV, we showcase the advantage of our end-to-end gesture recognition based UAV control system.
