Table of Contents
Fetching ...

Exploring Deep Reinforcement Learning for Robust Target Tracking using Micro Aerial Vehicles

Alberto Dionigi, Mirko Leomanni, Alessandro Saviolo, Giuseppe Loianno, Gabriele Costante

TL;DR

This work tackles robust target tracking for micro aerial vehicles using a model-free deep reinforcement learning approach that operates in output-feedback mode with relative-position measurements. It employs an asymmetric actor-critic framework (SAC) where the policy (A-DNN) maps history-augmented observations to continuous thrust/attitude commands, while a privileged critic (C-DNN) uses additional information during training. Robustness is built into the learning process via domain randomization over mass and actuation delay, guided by a carefully designed reward that emphasizes tracking accuracy, smoothness, and collision avoidance. Compared to a model-based LQG baseline, the DRL controller demonstrates comparable nominal performance but superior resilience under significant uncertainties, validated through extensive simulations and vision-based rendering, indicating practical potential for real-world vision-based MAV tracking.

Abstract

The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, relaxing the assumption of having access to full state information which is typical of related approaches in literature. Moreover, we exploit classical robustness indicators in the learning process through domain randomization to increase the robustness of the learned policy. Experimental results validate the proposed approach for target tracking, demonstrating high performance and robustness with respect to mass mismatches and control delays. The resulting nonlinear controller significantly outperforms a standard model-based design in numerous off-nominal scenarios.

Exploring Deep Reinforcement Learning for Robust Target Tracking using Micro Aerial Vehicles

TL;DR

This work tackles robust target tracking for micro aerial vehicles using a model-free deep reinforcement learning approach that operates in output-feedback mode with relative-position measurements. It employs an asymmetric actor-critic framework (SAC) where the policy (A-DNN) maps history-augmented observations to continuous thrust/attitude commands, while a privileged critic (C-DNN) uses additional information during training. Robustness is built into the learning process via domain randomization over mass and actuation delay, guided by a carefully designed reward that emphasizes tracking accuracy, smoothness, and collision avoidance. Compared to a model-based LQG baseline, the DRL controller demonstrates comparable nominal performance but superior resilience under significant uncertainties, validated through extensive simulations and vision-based rendering, indicating practical potential for real-world vision-based MAV tracking.

Abstract

The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, relaxing the assumption of having access to full state information which is typical of related approaches in literature. Moreover, we exploit classical robustness indicators in the learning process through domain randomization to increase the robustness of the learned policy. Experimental results validate the proposed approach for target tracking, demonstrating high performance and robustness with respect to mass mismatches and control delays. The resulting nonlinear controller significantly outperforms a standard model-based design in numerous off-nominal scenarios.
Paper Structure (12 sections, 15 equations, 6 figures, 2 tables)

This paper contains 12 sections, 15 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Target tracking task. The tracker (blue) follows the target (red) while maintaining attitude alignment.
  • Figure 2: Example of a MAV trajectory obtained by applying the DRL policy in the nominal scenario $\alpha=1$, $\delta=0$ ms.
  • Figure 3: Evolution of the tracking error and of the relative distance obtained by applying the DRL policy in the nominal scenario $\alpha=1$, $\delta=0$. The keep-out radius is depicted in red (dashed).
  • Figure 4: MAV trajectories obtained by applying the DRL and LQG policies in the wort-case scenario $\alpha=0.6$, $\delta=50$ ms.
  • Figure 5: Evolution of the tracking error and of the relative distance obtained by applying the DRL policy in the worst-case scenario $\alpha=0.6$, $\delta=50$ ms. The keep-out radius is depicted in red (dashed).
  • ...and 1 more figures

Theorems & Definitions (1)

  • Remark 1