Table of Contents
Fetching ...

Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation

Francisco Giral, Ignacio Gómez, Ricardo Vinuesa, Soledad Le Clainche

TL;DR

The paper tackles fault-tolerant control for fixed-wing UAVs under large dynamical changes by replacing traditional inner-loop control with a transformer that maps outer-loop references $h_{ref}$, $heading_{ref}$, and $V_{T,ref}$ directly to control actions. It uses a teacher–student framework: a privileged DreamerV3-based agent trains on full observability, and a 0.8M-parameter Decision Transformer learns from offline expert trajectories under partial observability, leveraging in-context learning to adapt to failures without explicit fault detection. A Neural Lyapunov function $V_ heta(s)$ learned via Koopman eigenfunctions and integrated into the training objective promotes stability, improving robustness in extreme damage scenarios. Experiments in high-fidelity simulations show superior tracking and reduced crash rates compared with industry FCS and RL baselines, with the DT's lightweight footprint enabling feasible embedded deployment after quantization. The approach offers a practical path toward safe and adaptive UAV operations by unifying real-time adaptability, fault tolerance, and computational efficiency in a single transformer-based framework.

Abstract

This study presents a transformer-based approach for fault-tolerant control in fixed-wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time to dynamic changes caused by structural damage or actuator failures. Unlike traditional Flight Control Systems (FCSs) that rely on classical control theory and struggle under severe alterations in dynamics, our method directly maps outer-loop reference values -- altitude, heading, and airspeed -- into control commands using the in-context learning and attention mechanisms of transformers, thus bypassing inner-loop controllers and fault-detection layers. Employing a teacher-student knowledge distillation framework, the proposed approach trains a student agent with partial observations by transferring knowledge from a privileged expert agent with full observability, enabling robust performance across diverse failure scenarios. Experimental results demonstrate that our transformer-based controller outperforms industry-standard FCS and state-of-the-art reinforcement learning (RL) methods, maintaining high tracking accuracy and stability in nominal conditions and extreme failure cases, highlighting its potential for enhancing UAV operational safety and reliability.

Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation

TL;DR

The paper tackles fault-tolerant control for fixed-wing UAVs under large dynamical changes by replacing traditional inner-loop control with a transformer that maps outer-loop references , , and directly to control actions. It uses a teacher–student framework: a privileged DreamerV3-based agent trains on full observability, and a 0.8M-parameter Decision Transformer learns from offline expert trajectories under partial observability, leveraging in-context learning to adapt to failures without explicit fault detection. A Neural Lyapunov function learned via Koopman eigenfunctions and integrated into the training objective promotes stability, improving robustness in extreme damage scenarios. Experiments in high-fidelity simulations show superior tracking and reduced crash rates compared with industry FCS and RL baselines, with the DT's lightweight footprint enabling feasible embedded deployment after quantization. The approach offers a practical path toward safe and adaptive UAV operations by unifying real-time adaptability, fault tolerance, and computational efficiency in a single transformer-based framework.

Abstract

This study presents a transformer-based approach for fault-tolerant control in fixed-wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time to dynamic changes caused by structural damage or actuator failures. Unlike traditional Flight Control Systems (FCSs) that rely on classical control theory and struggle under severe alterations in dynamics, our method directly maps outer-loop reference values -- altitude, heading, and airspeed -- into control commands using the in-context learning and attention mechanisms of transformers, thus bypassing inner-loop controllers and fault-detection layers. Employing a teacher-student knowledge distillation framework, the proposed approach trains a student agent with partial observations by transferring knowledge from a privileged expert agent with full observability, enabling robust performance across diverse failure scenarios. Experimental results demonstrate that our transformer-based controller outperforms industry-standard FCS and state-of-the-art reinforcement learning (RL) methods, maintaining high tracking accuracy and stability in nominal conditions and extreme failure cases, highlighting its potential for enhancing UAV operational safety and reliability.

Paper Structure

This paper contains 17 sections, 12 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Trajectory comparison between the proposed transformer-based controller (blue) and an industry-standard FCS (red). Figures (a) and (b) illustrate nominal scenario tracking, while (c) and (d) demonstrate the controllers’ responses to semi-wing damage, with the FCS losing control and the proposed method stabilizing the UAV
  • Figure 2: Online training process of the RL algorithm using Domain Randomization and privileged information to enhance adaptability and robustness under varying dynamics. The Domain Randomization (DR) block applies transformations to the physical parameters of the environment, modifying the dynamics. The privileged agent receives as input the concatenation of the current physics parameters and the UAV state obtained from the environment. Based on this information, the agent takes actions over the control surfaces ($\delta_a, \delta_e, \delta_r$) and throttle ($\delta_T$).
  • Figure 3: Knowledge distillation process through offline reinforcement learning using partial observations of the POMDP, derived from expert trajectories generated by the privileged agent.
  • Figure 4: Sample trajectories comparing our DT method against the baseline FCS, RL trained with DR (RL+DR), base RL agent (RL), and trained GS system (FCS+RL) in the nominal and damaged wing scenarios for tracking reference values of altitude ($h$), heading ($\Psi$), and airspeed ($V_T$). (a) Nominal scenario. (b) Damaged wing scenario. The dashed black line represents the setpoint for each value, and the vertical dashed red line marks the timestep at which the failure occurs.
  • Figure 5: Comparison between the privileged agent teacher and DT student policies across each failure scenario, using mean episode return as the performance metric.
  • ...and 3 more figures