Reinforcement Learning Based Prediction of PID Controller Gains for Quadrotor UAVs
Serhat Sönmez, Luca Montecchio, Simone Martini, Matthew J. Rutherford, Alessandro Rizzo, Margareta Stefanovic, Kimon P. Valavanis
TL;DR
This work tackles the challenge of achieving accurate quadrotor trajectory tracking by automatically tuning the inner-loop PD gains via reinforcement learning. A DDPG-based agent is trained offline in MATLAB/Simulink to adjust five normalized gain weights, using a piecewise attitude-error reward, and is validated through numerical simulations, Hardware-In-The-Loop testing, and outdoor flights. The study demonstrates that RL-tuned gains reduce attitude errors and overshoot compared with hand-tuned gains, and the gains can adapt online to disturbances, despite training without some physical effects. The results highlight the potential of RL-based fine-tuning to bridge simulation and real-world UAV control, with practical implications for robust, adaptable autonomous flight and directions for future enhancements such as disturbance-aware training and GPS-robust positioning.
Abstract
A reinforcement learning (RL) based methodology is proposed and implemented for online fine-tuning of PID controller gains, thus, improving quadrotor effective and accurate trajectory tracking. The RL agent is first trained offline on a quadrotor PID attitude controller and then validated through simulations and experimental flights. RL exploits a Deep Deterministic Policy Gradient (DDPG) algorithm, which is an off-policy actor-critic method. Training and simulation studies are performed using Matlab/Simulink and the UAV Toolbox Support Package for PX4 Autopilots. Performance evaluation and comparison studies are performed between the hand-tuned and RL-based tuned approaches. The results show that the controller parameters based on RL are adjusted during flights, achieving the smallest attitude errors, thus significantly improving attitude tracking performance compared to the hand-tuned approach.
