Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

David Olivares; Pierre Fournier; Pavan Vasishta; Julien Marzat

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

David Olivares, Pierre Fournier, Pavan Vasishta, Julien Marzat

TL;DR

The results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model-free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes.

Abstract

This paper evaluates and compares the performance of model-free and model-based reinforcement learning for the attitude control of fixed-wing unmanned aerial vehicles using PID as a reference point. The comparison focuses on their ability to handle varying flight dynamics and wind disturbances in a simulated environment. Our results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model-free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes. Furthermore, we introduce actuation fluctuation as a key metric to assess energy efficiency and actuator wear, and we test two different approaches from the literature: action variation penalty and conditioning for action policy smoothness. We also evaluate all control methods when subject to stochastic turbulence and gusts separately, so as to measure their effects on tracking performance, observe their limitations and outline their implications on the Markov decision process formalism.

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

TL;DR

Abstract

Paper Structure (29 sections, 18 equations, 5 figures, 8 tables)

This paper contains 29 sections, 18 equations, 5 figures, 8 tables.

INTRODUCTION
RELATED WORKS
Reinforcement Learning Control of UAVs
Model-Free RL
Model-Based RL
UAV MODEL
Kinematics
Aerodynamic Forces and Moments
Wind Disturbances
Aerodynamic Model
Propulsion Forces and Moments
Actuator Dynamics and Constraints
CONTROL METHODS
PID Control
Reinforcement Learning (RL) Control
...and 14 more sections

Figures (5)

Figure 1: TD-MPC Control Block Diagram. 1) The trained TOLD model is used for simulating candidate trajectories and estimating their return. 2) Using importance sampling update an action Gaussian distribution and sample it. 3) Sample an action from the action distribution and apply it.
Figure 2: Stochastic Turbulences + Action Regulation Average tracking RMSE on nominal reference difficulty.
Figure 3: Wind Gusts + Action Regulation. Average tracking RMSE on nominal reference difficulty.
Figure 4: TD-MPC's Superiority for Hard References: PPO vs TD-MPC. Red dashed lines are the references: Roll = $55^{\circ}$, Pitch = $28^{\circ}$. The red area around the reference line corresponds to $\pm 5^{\circ}$ error bounds
Figure 5: Mitigating Highly Action Oscillating Policy: SAC vs SAC + CAPS. Red dashed lines are the references: Roll = $-15^{\circ}$, Pitch = $-10^{\circ}$. The red area around the reference line corresponds to $\pm 5^{\circ}$ error bounds

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

TL;DR

Abstract

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)