Implementing TD3 to train a Neural Network to fly a Quadcopter through an FPV Gate
Patrick Thomas, Kevin Schroeder, Jonathan Black
TL;DR
This work investigates training a velocity controller for a quadcopter to navigate through an FPV gate using Twin Delayed Deep Deterministic Policy Gradient (TD3) in a Gymnasium-based simulated environment. The policy is then transferred to a real quadcopter via a Raspberry Pi 4B using TensorFlow Lite and ROS1, with a mixed-reality lab setup to validate gate-centre navigation. Training reveals TD3’s sample inefficiency and the critical impact of accurate vehicle dynamics for sim-to-real transfer; initial real-world tests underperformed due to mismodeled response times, but a re-tuned model and extended training yielded notable improvements. The study highlights the potential and challenges of sim-to-real reinforcement learning for drone control and points to future improvements such as incorporating acceleration, recurrent architectures, and model-based pretraining to close the gap to traditional controllers like PD.
Abstract
Deep Reinforcement learning has shown to be a powerful tool for developing policies in environments where an optimal solution is unclear. In this paper, we attempt to apply Twin Delayed Deep Deterministic Policy Gradients to train a neural network to act as a velocity controller for a quadcopter. The quadcopter's objective is to quickly fly through a gate while avoiding crashing into the gate. We transfer our trained policy to the real world by deploying it on a quadcopter in a laboratory environment. Finally, we demonstrate that the trained policy is able to navigate the drone to the gate in the real world.
