Table of Contents
Fetching ...

Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate

Patrick Thomas, Kevin Schroeder, Jonathan Black

TL;DR

This work investigates training a velocity controller for a quadcopter to navigate through an FPV gate using Twin Delayed Deep Deterministic Policy Gradient (TD3) in a Gymnasium-based simulated environment. The policy is then transferred to a real quadcopter via a Raspberry Pi 4B using TensorFlow Lite and ROS1, with a mixed-reality lab setup to validate gate-centre navigation. Training reveals TD3’s sample inefficiency and the critical impact of accurate vehicle dynamics for sim-to-real transfer; initial real-world tests underperformed due to mismodeled response times, but a re-tuned model and extended training yielded notable improvements. The study highlights the potential and challenges of sim-to-real reinforcement learning for drone control and points to future improvements such as incorporating acceleration, recurrent architectures, and model-based pretraining to close the gap to traditional controllers like PD.

Abstract

Deep Reinforcement learning has shown to be a powerful tool for developing policies in environments where an optimal solution is unclear. In this paper, we attempt to apply Twin Delayed Deep Deterministic Policy Gradients to train a neural network to act as a velocity controller for a quadcopter. The quadcopter's objective is to quickly fly through a gate while avoiding crashing into the gate. We transfer our trained policy to the real world by deploying it on a quadcopter in a laboratory environment. Finally, we demonstrate that the trained policy is able to navigate the drone to the gate in the real world.

Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate

TL;DR

This work investigates training a velocity controller for a quadcopter to navigate through an FPV gate using Twin Delayed Deep Deterministic Policy Gradient (TD3) in a Gymnasium-based simulated environment. The policy is then transferred to a real quadcopter via a Raspberry Pi 4B using TensorFlow Lite and ROS1, with a mixed-reality lab setup to validate gate-centre navigation. Training reveals TD3’s sample inefficiency and the critical impact of accurate vehicle dynamics for sim-to-real transfer; initial real-world tests underperformed due to mismodeled response times, but a re-tuned model and extended training yielded notable improvements. The study highlights the potential and challenges of sim-to-real reinforcement learning for drone control and points to future improvements such as incorporating acceleration, recurrent architectures, and model-based pretraining to close the gap to traditional controllers like PD.

Abstract

Deep Reinforcement learning has shown to be a powerful tool for developing policies in environments where an optimal solution is unclear. In this paper, we attempt to apply Twin Delayed Deep Deterministic Policy Gradients to train a neural network to act as a velocity controller for a quadcopter. The quadcopter's objective is to quickly fly through a gate while avoiding crashing into the gate. We transfer our trained policy to the real world by deploying it on a quadcopter in a laboratory environment. Finally, we demonstrate that the trained policy is able to navigate the drone to the gate in the real world.

Paper Structure

This paper contains 10 sections, 7 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 1: The gate design used for the BattleDrones competition. The AprilTag is located in the front lower center of the gate. The gate opening is 1.6 meters wide by 1 meter tall.
  • Figure 2: The drone and gate in the visualization of the simulated environment. The drone is the blue object and the gate is the maroon object. The bounds are shown by the gray walls and teh ground is shown by the green floor.
  • Figure 3: Diagram depicting the computer setup in the SpaceDrones lab. We use a distributed system with ROS 1 as the backbone for communicating between nodes on the Network.
  • Figure 4: Training plot on the final training session. Learning across $2.5e6$ time steps. This amounts to about 15000 'games' played. A perfect score would be slightly higher than 400.
  • Figure 5: Images from the flight test showing the mixed reality setup. In Fig \ref{['fig:subim1']}, the drone in the virtual world can be seen with the virtual gate. Fig \ref{['fig:subim2']} shows the actual drone flying in an open lab space. The drones position and orientation in the real world is used to move the virtual drone in the simulated world.
  • ...and 2 more figures