Table of Contents
Fetching ...

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

TL;DR

This work presents a cascaded reinforcement learning approach for quadrotor control, where SAC computes the overall thrust vector and desired roll/pitch and passes them, with the current yaw, to an attitude PID that generates rotor RPMs. Compared to direct RPM control with SAC, the thrust-vector method achieves faster training and smoother, more accurate path-following in simulation. Dense reward shaping guides the agent toward stable hovering and precise tracking, while hyperparameters are tuned for stability at 50 Hz. The results demonstrate zero steady-state error in hover and superior path-following performance, highlighting the potential of combining RL with conventional PID controllers for improved low-level quadrotor control.

Abstract

This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll ($φ$) and Pitch ($θ$) angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle ($ψ$) to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

TL;DR

This work presents a cascaded reinforcement learning approach for quadrotor control, where SAC computes the overall thrust vector and desired roll/pitch and passes them, with the current yaw, to an attitude PID that generates rotor RPMs. Compared to direct RPM control with SAC, the thrust-vector method achieves faster training and smoother, more accurate path-following in simulation. Dense reward shaping guides the agent toward stable hovering and precise tracking, while hyperparameters are tuned for stability at 50 Hz. The results demonstrate zero steady-state error in hover and superior path-following performance, highlighting the potential of combining RL with conventional PID controllers for improved low-level quadrotor control.

Abstract

This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll () and Pitch () angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle () to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.

Paper Structure

This paper contains 11 sections, 3 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Block diagram of low-level RL RPM Controller
  • Figure 2: Block diagram of low-level RL Thrust Vector Controller
  • Figure 3: Mean reward for the stabilization training of thrust vector and RPM controllers
  • Figure 4: Stabilization response for an initial position of [-1.5, 1.5, 1.5] for thrust vector controller
  • Figure 5: Stabilization response for an initial position of [-1.5, 1.5, 1.5] for RPM controller
  • ...and 7 more figures