Table of Contents
Fetching ...

Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures

Dennis Marquis, Mazen Farhood

Abstract

This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is conditioned on a parameterization of actuator faults using hypernetwork-based adaptation. We consider parameter-efficient formulations based on Feature-wise Linear Modulation (FiLM) and Low-Rank Adaptation (LoRA), trained using proximal policy optimization. We demonstrate that hypernetwork-conditioned policies can improve robustness compared to standard multilayer perceptron policies. In particular, hypernetwork-conditioned policies generalize effectively to time-varying actuator failure modes not encountered during training. The approach is validated through high-fidelity simulations, using a realistic six-degree-of-freedom fixed-wing aircraft model.

Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures

Abstract

This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is conditioned on a parameterization of actuator faults using hypernetwork-based adaptation. We consider parameter-efficient formulations based on Feature-wise Linear Modulation (FiLM) and Low-Rank Adaptation (LoRA), trained using proximal policy optimization. We demonstrate that hypernetwork-conditioned policies can improve robustness compared to standard multilayer perceptron policies. In particular, hypernetwork-conditioned policies generalize effectively to time-varying actuator failure modes not encountered during training. The approach is validated through high-fidelity simulations, using a realistic six-degree-of-freedom fixed-wing aircraft model.

Paper Structure

This paper contains 21 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure E1: Average MaxPE across failure magnitudes for each actuator. Top: static failures. Bottom: flutter failures.
  • Figure E2: Example rudder flutter signal from a simulation episode. This type of conditioning signal is used to assess generalization to unseen, nonstationary failures.
  • Figure E3: State and control histories for a MLP WC episode under rudder flutter, compared against the Film + HC policy.
  • Figure E4: State and control histories for a FilM + HC WC episode under rudder flutter, compared against the MLP policy.

Theorems & Definitions (1)

  • Remark 1