Visual Heading Prediction for Autonomous Aerial Vehicles
Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Parham Kebria, Mahmoud Nabil Mahmoud, Xiaohong Yuan, Abdollah Homaifar
TL;DR
<3-5 sentence high-level summary> Addresses real-time UAV-UGV coordination in GPS-denied environments using a vision-only pipeline that combines a fine-tuned YOLOv5 detector for UGVs with a compact ANN that regresses the UAV heading from monocular bounding-box features. A high-fidelity VICON-based dataset of over 13,000 labeled frames enables supervised training, achieving MAE 0.1506° and RMSE 0.1957°, with 95% UGV detection accuracy and 31 ms inference per frame. The approach removes reliance on external localization, enabling deployment on embedded hardware for GPS-denied multi-agent operations. It offers a scalable, lightweight solution for reliable aerial-ground coordination in dynamic, infrastructure-sparse scenarios.
Abstract
The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration
