Table of Contents
Fetching ...

YoloTag: Vision-based Robust UAV Navigation with Fiducial Markers

Sourav Raxit, Simant Bahadur Singh, Abdullah Al Redwan Newaz

TL;DR

This paper tackles GPS-denied UAV localization by leveraging fiducial markers detected with a lightweight YOLOv8-based detector and fused through an efficient EPnP-based 4D pose estimator. To address noisy pose outputs, it introduces a Butterworth low-pass filter that smooths trajectories while balancing delay, validated in indoor real-robot experiments. Key contributions include (i) fast multi-marker detection with YOLOv8, (ii) robust pose estimation from multiple landmarks, (iii) noise reduction via a higher-order Butterworth filter, and (iv) comprehensive indoor benchmarks showing real-time performance (55 FPS) and improved trajectory accuracy over Apriltag and DeepTag. The work demonstrates a practical, real-time fiducial-marker localization system for GPS-denied UAV navigation and points to future object-based localization to broaden applicability.

Abstract

By harnessing fiducial markers as visual landmarks in the environment, Unmanned Aerial Vehicles (UAVs) can rapidly build precise maps and navigate spaces safely and efficiently, unlocking their potential for fluent collaboration and coexistence with humans. Existing fiducial marker methods rely on handcrafted feature extraction, which sacrifices accuracy. On the other hand, deep learning pipelines for marker detection fail to meet real-time runtime constraints crucial for navigation applications. In this work, we propose YoloTag -a real-time fiducial marker-based localization system. YoloTag uses a lightweight YOLO v8 object detector to accurately detect fiducial markers in images while meeting the runtime constraints needed for navigation. The detected markers are then used by an efficient perspective-n-point algorithm to estimate UAV states. However, this localization system introduces noise, causing instability in trajectory tracking. To suppress noise, we design a higher-order Butterworth filter that effectively eliminates noise through frequency domain analysis. We evaluate our algorithm through real-robot experiments in an indoor environment, comparing the trajectory tracking performance of our method against other approaches in terms of several distance metrics.

YoloTag: Vision-based Robust UAV Navigation with Fiducial Markers

TL;DR

This paper tackles GPS-denied UAV localization by leveraging fiducial markers detected with a lightweight YOLOv8-based detector and fused through an efficient EPnP-based 4D pose estimator. To address noisy pose outputs, it introduces a Butterworth low-pass filter that smooths trajectories while balancing delay, validated in indoor real-robot experiments. Key contributions include (i) fast multi-marker detection with YOLOv8, (ii) robust pose estimation from multiple landmarks, (iii) noise reduction via a higher-order Butterworth filter, and (iv) comprehensive indoor benchmarks showing real-time performance (55 FPS) and improved trajectory accuracy over Apriltag and DeepTag. The work demonstrates a practical, real-time fiducial-marker localization system for GPS-denied UAV navigation and points to future object-based localization to broaden applicability.

Abstract

By harnessing fiducial markers as visual landmarks in the environment, Unmanned Aerial Vehicles (UAVs) can rapidly build precise maps and navigate spaces safely and efficiently, unlocking their potential for fluent collaboration and coexistence with humans. Existing fiducial marker methods rely on handcrafted feature extraction, which sacrifices accuracy. On the other hand, deep learning pipelines for marker detection fail to meet real-time runtime constraints crucial for navigation applications. In this work, we propose YoloTag -a real-time fiducial marker-based localization system. YoloTag uses a lightweight YOLO v8 object detector to accurately detect fiducial markers in images while meeting the runtime constraints needed for navigation. The detected markers are then used by an efficient perspective-n-point algorithm to estimate UAV states. However, this localization system introduces noise, causing instability in trajectory tracking. To suppress noise, we design a higher-order Butterworth filter that effectively eliminates noise through frequency domain analysis. We evaluate our algorithm through real-robot experiments in an indoor environment, comparing the trajectory tracking performance of our method against other approaches in terms of several distance metrics.
Paper Structure (10 sections, 9 equations, 4 figures, 1 table)

This paper contains 10 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The UAV's onboard camera captures images of the fiducial markers. YOLO V8 identifies these markers in the images, extracting their corner points for a pose estimator to calculate the drone's position. Subsequently, a noise suppression module refines the poses to precisely establish the UAV's state.
  • Figure 2: The Blue line depicts a raw trajectory from the EPnP algorithm while the orange line depict the corresponding filtered trajectory from the Butterworth filter in Fig. \ref{['fig:subfig_1']}. Fig.\ref{['fig:subfig_2']} demonstrates how a Fast Fourier Transform (FFT) is utilized to determine an appropriate cutoff frequency for the Butterworth filter. The phase response of the filter is characterized in the phase plot (Fig. \ref{['fig:subfig_3']}), showing the phase shift introduced across frequencies — specifically, a $-150^\circ$ phase shift at $1$ rad/s. Similarly, the magnitude response is shown in Fig. \ref{['fig:subfig_4']}, with the filter inducing a $-20$ dB attenuation at $1$ rad/s. Analyzing these frequency responses aids in understanding the behavior of the filter and how it impacts the trajectory data.
  • Figure 3: The dataset was generated by flying a Bebop2 within a closed environment measuring $6.10$ m × $5.85$ m × $2.44$ m. A Vicon motion capture system (Fig. \ref{['fig:vicon_setup']}) was employed to capture the UAV's motion, while the onboard camera was used to capture fiducial markers (Fig. \ref{['fig:bebop2_with_marker']}), enabling the generation of ground truth data.
  • Figure 4: The figures show the ground truth trajectory from the Vicon system (red) compared with the raw trajectory from the AprilTag detector (green) and the filtered trajectories from the DeepTag (purple) and YoloTag (blue) detectors. Figures \ref{['fig:Yolotag_april_spiral_comparison']} and \ref{['fig:Yolotag_april_rect_comparison']} illustrate the performance of AprilTag, DeepTag, and YoloTag detectors, alongside the Vicon system, in tracking spiral and rectangular eight-shaped trajectories. The YoloTag detector outperformed AprilTag and DeepTag detectors in accurately estimating the UAV's state, demonstrating superior tracking capabilities for both trajectory profiles.