Table of Contents
Fetching ...

Dual Agent Learning Based Aerial Trajectory Tracking

Shaswat Garg, Houman Masnavi, Baris Fidan, Farrokh Janabi-Sharifi

TL;DR

A novel reinforcement learning framework for trajectory tracking of autonomous aerial vehicles in cluttered environments using a dual-agent architecture that ensures real-time performance and adaptability to uncertainties in both static and dynamic scenarios is presented.

Abstract

This paper presents a novel reinforcement learning framework for trajectory tracking of unmanned aerial vehicles in cluttered environments using a dual-agent architecture. Traditional optimization methods for trajectory tracking face significant computational challenges and lack robustness in dynamic environments. Our approach employs deep reinforcement learning (RL) to overcome these limitations, leveraging 3D pointcloud data to perceive the environment without relying on memory-intensive obstacle representations like occupancy grids. The proposed system features two RL agents: one for predicting UAV velocities to follow a reference trajectory and another for managing collision avoidance in the presence of obstacles. This architecture ensures real-time performance and adaptability to uncertainties. We demonstrate the efficacy of our approach through simulated and real-world experiments, highlighting improvements over state-of-the-art RL and optimization-based methods. Additionally, a curriculum learning paradigm is employed to scale the algorithms to more complex environments, ensuring robust trajectory tracking and obstacle avoidance in both static and dynamic scenarios.

Dual Agent Learning Based Aerial Trajectory Tracking

TL;DR

A novel reinforcement learning framework for trajectory tracking of autonomous aerial vehicles in cluttered environments using a dual-agent architecture that ensures real-time performance and adaptability to uncertainties in both static and dynamic scenarios is presented.

Abstract

This paper presents a novel reinforcement learning framework for trajectory tracking of unmanned aerial vehicles in cluttered environments using a dual-agent architecture. Traditional optimization methods for trajectory tracking face significant computational challenges and lack robustness in dynamic environments. Our approach employs deep reinforcement learning (RL) to overcome these limitations, leveraging 3D pointcloud data to perceive the environment without relying on memory-intensive obstacle representations like occupancy grids. The proposed system features two RL agents: one for predicting UAV velocities to follow a reference trajectory and another for managing collision avoidance in the presence of obstacles. This architecture ensures real-time performance and adaptability to uncertainties. We demonstrate the efficacy of our approach through simulated and real-world experiments, highlighting improvements over state-of-the-art RL and optimization-based methods. Additionally, a curriculum learning paradigm is employed to scale the algorithms to more complex environments, ensuring robust trajectory tracking and obstacle avoidance in both static and dynamic scenarios.

Paper Structure

This paper contains 18 sections, 12 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: The pipeline for our dual-agent architecture. We switch between the trajectory tracking ($\pi_{trt}$) and collision avoidance ($\pi_{cva}$) policy based on a collision criteria which takes the current pointcloud data and velocity of the UAV as input.
  • Figure 2: Schematic diagram of UAV trying to maneuver around obstacles. There is a UAV in the middle (green circle), with the LiDAR pointcloud around it (blue circle). $O_1$, $O_2$ and $O_3$ are the obstacles. There are different scenarios where the UAV could move with either of velocities $\mathbf{v_1}$, $\mathbf{v_2}$, $\mathbf{v_3}$ or $\mathbf{v_4}$.
  • Figure 3: An illustration of collision set $S_{col}$ (green) and trajectory set $S_{trt}$ (orange) with two policies $\pi_{cva}$ and $\pi_{trt}$. On the left, with the naive approach, there is a constant switch between the collision avoidance policy $\pi_{cva}$ and trajectory tracking policy $\pi_{trt}$. On the right, by applying the appropriate switching mechanism the collision avoidance agent pushes the UAV away from obstacle and switches to trajectory tracking policy only when safe.
  • Figure 4: Curriculum training for trajectory tracking policy $\pi_{trt}$ and collision avoidance policy $\pi_{cva}$. (a) is the environment used to train $\pi_{trt}$ seen from X-Y plane. Initially, the UAV is tasked with following trajectories starting from the point $\mathbf{p}{start}$. After training, the UAV is repositioned randomly within the region $C{rdm}$ and retrained to follow the trajectories indicated by the black rays. (b) is for $\pi_{cva}$ it starts with a simple two obstacle world containing two cylinders with radius $0.1$$m$. Following that the policy is trained in an environment where the radius of the cylinder is $0.3$$m$. Finally it is trained in an environment containing cuboids.
  • Figure 5: Comparison of trajectories generated in different obstacle worlds by RL algorithms and SOTA. (a) 5 obstacles, (b) 9 obstacles, (c) Random world with 22 obstacles. In all three scenarios, the proposed dual-agent algorithm tracks the trajectory more precisely, whereas the state-of-the-art (SOTA) algorithm exhibits significant deviations along the Z-axis and produces a suboptimal trajectory.
  • ...and 2 more figures