An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data

Gulay Goktas Sever; Umut Demir; Abdullah Sadik Satir; Mustafa Cagatay Sahin; Nazim Kemal Ure

An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data

Gulay Goktas Sever, Umut Demir, Abdullah Sadik Satir, Mustafa Cagatay Sahin, Nazim Kemal Ure

TL;DR

The work addresses robust agile aircraft maneuver generation with limited pilot data by leveraging a source-model to generate unlimited data and an integrated IL-TL-RL pipeline. It starts with imitation learning via Behavior Cloning, enhanced by Confidence-DAgger for robustness, then applies transfer learning to adapt to a target aircraft with minimal data, and finally employs additive reinforcement learning (TD3) to adapt to updated dynamics, where the final command is $A_{TL+RL} = A_{TL} + C_{RL} A_{RL}$ and $C_{RL}\in(0,1]$. The approach is validated using real pilot data from Turkish Aerospace Industries and an open-source F-16 as the source, achieving cross-trim and cross-aircraft generalization with few target demonstrations and rapid RL fine-tuning (1–2 hours). The results demonstrate a data-efficient, robust, transferable framework for agile maneuver generation that reduces pilot data requirements and accelerates validation of prototypes.

Abstract

In this paper, we present a methodology for constructing data-driven maneuver generation models for agile aircraft that can generalize across a wide range of trim conditions and aircraft model parameters. Maneuver generation models play a crucial role in the testing and evaluation of aircraft prototypes, providing insights into the maneuverability and agility of the aircraft. However, constructing the models typically requires extensive amounts of real pilot data, which can be time-consuming and costly to obtain. Moreover, models built with limited data often struggle to generalize beyond the specific flight conditions covered in the original dataset. To address these challenges, we propose a hybrid architecture that leverages a simulation model, referred to as the source model. This open-source agile aircraft simulator shares similar dynamics with the target aircraft and allows us to generate unlimited data for building a proxy maneuver generation model. We then fine-tune this model to the target aircraft using a limited amount of real pilot data. Our approach combines techniques from imitation learning, transfer learning, and reinforcement learning to achieve this objective. To validate our methodology, we utilize real agile pilot data provided by Turkish Aerospace Industries (TAI). By employing the F-16 as the source model, we demonstrate that it is possible to construct a maneuver generation model that generalizes across various trim conditions and aircraft parameters without requiring any additional real pilot data. Our results showcase the effectiveness of our approach in developing robust and adaptable models for agile aircraft.

An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data

TL;DR

and

. The approach is validated using real pilot data from Turkish Aerospace Industries and an open-source F-16 as the source, achieving cross-trim and cross-aircraft generalization with few target demonstrations and rapid RL fine-tuning (1–2 hours). The results demonstrate a data-efficient, robust, transferable framework for agile maneuver generation that reduces pilot data requirements and accelerates validation of prototypes.

Abstract

Paper Structure (23 sections, 5 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 5 equations, 13 figures, 4 tables, 1 algorithm.

Introduction
Previous Work
Conventional Control System Design Based Methods
Learning Based Methods
Contributions
Problem Definition
Problem Definition
Pilot Imitation Model
Pilot Behavior Cloning
Data Collection
Neural Network Architecture
Training and Evaluation
Robust Pilot Behavior Cloning via Confidence DAgger
Training and Evaluation
Transfer Learning to Different Aircraft Models
...and 8 more sections

Figures (13)

Figure 1: Our work outlines a three-step maneuver generation modeling process. First, we gather initial pilot demonstrations from a target aircraft model. Then, in Step 1, we apply our C-Dagger imitation learning method, which learns a behavior policy from these demonstrations using a simulated source aircraft model. In Step 2, transfer learning methods adapt this model to the target aircraft, accommodating different dynamics or characteristics. Lastly, Step 3 employs reinforcement learning techniques to further generalize the model, ensuring adaptability and robustness to variations in aircraft parameters.
Figure 2: Expert pilot maneuvers are employed to extract body angular rates (roll axis angular rate P, pitch axis angular rate Q, and yaw axis angular rate R) as reference signal for the Nonlinear Dynamic Inversion (NDI) controller. These maneuvers are replicated across different trim conditions using an open-source aircraft model known as the source model.
Figure 3: Our implementation utilizes a composite LSTM autoencoder for high-dimensional aircraft data sequence reconstruction and prediction. The encoder part compresses data into a lower-dimensional latent space, capturing essential dynamics. Alongside, a prediction branch in the autoencoder leverages this latent representation to forecast future aircraft inputs.
Figure 4: To evaluate the BC policy, we executed Split-S and Chandelle maneuvers on Aircraft - 2 Source Model using BC and compared them with the NDI controller-executed trajectories. This comparison helps determine the BC policy's effectiveness in replicating desired maneuvers.
Figure 5: Figure shows the results of Split-S and Chandelle maneuvers in the first and second columns, respectively. Each column contains three subfigures, comparing the performance of the BC policy with NDI, and showing the results of the first and second C-DAgger iterations. The C-DAgger algorithm improves maneuver performance, as demonstrated in each row.
...and 8 more figures

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data

TL;DR

Abstract

An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (3)