Table of Contents
Fetching ...

Visual Heading Prediction for Autonomous Aerial Vehicles

Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Parham Kebria, Mahmoud Nabil Mahmoud, Xiaohong Yuan, Abdollah Homaifar

TL;DR

<3-5 sentence high-level summary> Addresses real-time UAV-UGV coordination in GPS-denied environments using a vision-only pipeline that combines a fine-tuned YOLOv5 detector for UGVs with a compact ANN that regresses the UAV heading from monocular bounding-box features. A high-fidelity VICON-based dataset of over 13,000 labeled frames enables supervised training, achieving MAE 0.1506° and RMSE 0.1957°, with 95% UGV detection accuracy and 31 ms inference per frame. The approach removes reliance on external localization, enabling deployment on embedded hardware for GPS-denied multi-agent operations. It offers a scalable, lightweight solution for reliable aerial-ground coordination in dynamic, infrastructure-sparse scenarios.

Abstract

The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration

Visual Heading Prediction for Autonomous Aerial Vehicles

TL;DR

<3-5 sentence high-level summary> Addresses real-time UAV-UGV coordination in GPS-denied environments using a vision-only pipeline that combines a fine-tuned YOLOv5 detector for UGVs with a compact ANN that regresses the UAV heading from monocular bounding-box features. A high-fidelity VICON-based dataset of over 13,000 labeled frames enables supervised training, achieving MAE 0.1506° and RMSE 0.1957°, with 95% UGV detection accuracy and 31 ms inference per frame. The approach removes reliance on external localization, enabling deployment on embedded hardware for GPS-denied multi-agent operations. It offers a scalable, lightweight solution for reliable aerial-ground coordination in dynamic, infrastructure-sparse scenarios.

Abstract

The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration

Paper Structure

This paper contains 29 sections, 7 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Overview of the proposed UAV-UGV coordination framework. The forward-facing camera (C1) is used for UGV detection and heading estimation. The ANN predicts alignment. The downward-facing camera (C2) confirms landing.
  • Figure 2: Top view of the laboratory scene showing VICON camera placement and UGV/UAV layout.
  • Figure 3: VICON Tracker interface showing UAV and UGV marker positions in the 3D capture space. The left panel presents a top-down (horizontal) view of the tracking volume, while the right panel shows a side (elevation) perspective, enabling spatial verification of relative height and position. The red and blue bounding boxes highlight the tracked UAV and UGV, respectively.
  • Figure 4: Data generation pipeline showing raw camera frames (bottom) aligned with VICON-recorded positional metadata (top).
  • Figure 5: Training and validation loss curves: (Left) Box loss representing localization error. (Right) Class loss representing classification accuracy over epochs.
  • ...and 10 more figures