Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

Hossein Rastgoftar; Muhammad J. H. Zahed

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

Hossein Rastgoftar, Muhammad J. H. Zahed

TL;DR

Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

Abstract

This paper presents a deep Q-network (DQN)-based gain-scheduling framework for safety-critical quadcopter trajectory tracking. Instead of directly learning control inputs, the proposed approach selects from a finite set of pre-certified stabilizing gain vectors, enabling reinforcement learning to operate within a structured and stability-preserving control architecture. By exploiting the isotropic structure of the translational dynamics, feedback gains are shared across spatial axes to reduce dimensionality while preserving performance. The learned policy adapts feedback aggressiveness in real time, applying high authority during large transients and reducing gains near convergence to limit control effort. Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

TL;DR

Abstract

Paper Structure (15 sections, 56 equations, 6 figures, 1 table)

This paper contains 15 sections, 56 equations, 6 figures, 1 table.

Introduction
Related Work
Contributions
Paper Outline
Problem Statement
Quacopter Control Stability
Nonlinear State-Space Dynamics
Quadcopter Control
Control-Friendly Discrete-Time Dynamics
DQN-Based Gain Scheduling
MDP Formulation
Deep Q-Network Approximation
Learned Gain Scheduling Policy
Simulation Results
Conclusion

Figures (6)

Figure 2: DQN rollout: selected translational feedback gains shared across axes (representative gains shown). The policy increases gains during the initial transient and reduces them as tracking errors diminish.
Figure 3: DQN rollout: external error states. Translational error components (position/velocity/acceleration/jerk) converge toward the origin while the yaw channel remains regulated, demonstrating stable closed-loop behavior under learned gain scheduling.
Figure 4: Physical evaluation: inertial position versus desired position. The dotted line marks $T_f$; for $t>T_f$ the reference is held at $\mathbf{r}_d(T_f)$ and the quadcopter settles to hover.
Figure 5: Physical evaluation: Euler angles $(\phi,\theta,\psi)$. Attitude excursions remain small and decay to near zero as tracking converges.
Figure 6: Physical evaluation: control inputs. The thrust second-derivative command $\ddot T$ and body torques $\boldsymbol{\tau}$ are largest during the initial transient and decrease as the state approaches the reference.
...and 1 more figures

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

TL;DR

Abstract

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

Authors

TL;DR

Abstract

Table of Contents

Figures (6)