Table of Contents
Fetching ...

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

Hossein Rastgoftar, Muhammad J. H. Zahed

TL;DR

Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

Abstract

This paper presents a deep Q-network (DQN)-based gain-scheduling framework for safety-critical quadcopter trajectory tracking. Instead of directly learning control inputs, the proposed approach selects from a finite set of pre-certified stabilizing gain vectors, enabling reinforcement learning to operate within a structured and stability-preserving control architecture. By exploiting the isotropic structure of the translational dynamics, feedback gains are shared across spatial axes to reduce dimensionality while preserving performance. The learned policy adapts feedback aggressiveness in real time, applying high authority during large transients and reducing gains near convergence to limit control effort. Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

TL;DR

Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

Abstract

This paper presents a deep Q-network (DQN)-based gain-scheduling framework for safety-critical quadcopter trajectory tracking. Instead of directly learning control inputs, the proposed approach selects from a finite set of pre-certified stabilizing gain vectors, enabling reinforcement learning to operate within a structured and stability-preserving control architecture. By exploiting the isotropic structure of the translational dynamics, feedback gains are shared across spatial axes to reduce dimensionality while preserving performance. The learned policy adapts feedback aggressiveness in real time, applying high authority during large transients and reducing gains near convergence to limit control effort. Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.
Paper Structure (15 sections, 56 equations, 6 figures, 1 table)

This paper contains 15 sections, 56 equations, 6 figures, 1 table.

Figures (6)

  • Figure 2: DQN rollout: selected translational feedback gains shared across axes (representative gains shown). The policy increases gains during the initial transient and reduces them as tracking errors diminish.
  • Figure 3: DQN rollout: external error states. Translational error components (position/velocity/acceleration/jerk) converge toward the origin while the yaw channel remains regulated, demonstrating stable closed-loop behavior under learned gain scheduling.
  • Figure 4: Physical evaluation: inertial position versus desired position. The dotted line marks $T_f$; for $t>T_f$ the reference is held at $\mathbf{r}_d(T_f)$ and the quadcopter settles to hover.
  • Figure 5: Physical evaluation: Euler angles $(\phi,\theta,\psi)$. Attitude excursions remain small and decay to near zero as tracking converges.
  • Figure 6: Physical evaluation: control inputs. The thrust second-derivative command $\ddot T$ and body torques $\boldsymbol{\tau}$ are largest during the initial transient and decrease as the state approaches the reference.
  • ...and 1 more figures