Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft
Yifei Li, Erik-Jan van Kampen
TL;DR
This work tackles offline reinforcement learning for fixed-wing flight control by exploiting dynamical symmetry to enhance sample efficiency. It proposes a symmetric data augmentation (SDA) scheme and a dual-critic framework, including a two-step approximate value iteration (AVI), to better utilize augmented data in Deep Deterministic Policy Gradient (DDPG) learning. The authors also analyze the aircraft’s symmetry to justify augmentation and integrate action-smoothness regularizers (CAPS) to improve robustness. Simulation results show faster policy convergence and improved state-space coverage and attitude-tracking performance for DDPG-SDA and DDPG-SCA (symmetric critic augmentation) compared with standard DDPG. Overall, the approach reduces exploration demands while delivering reliable control performance in unvisited regions, highlighting symmetry as a practical tool for sample-efficient offline RL in aerospace control, with potential for broader physics-informed RL applications.
Abstract
The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.
