Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility

Luis E. Alvarez; Marc W. Brittain; Steven D. Young

Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility

Luis E. Alvarez, Marc W. Brittain, Steven D. Young

TL;DR

The paper investigates how Deep Reinforcement Learning can support autonomous contingency management (CM) in Advanced Air Mobility (AAM) by formulating CM as an $MDP$ and comparing DRL agents to a heuristic baseline within the AAM-Gym framework. It examines two DRL algorithms, D2MAV_A and SACD, across a comprehensive hazard- and energy-aware environment, highlighting the potential for DRL to improve safety and efficiency in high-density airspace. Key findings show DRL agents achieving higher reroute success and substantially lower loss-of-control events than heuristics, albeit with substantial training demands and safety-verification challenges. The work also introduces a scalable simulation framework for evaluating safety-critical AI in aviation and discusses practical considerations like catastrophic forgetting and the need for robust validation before deployment.

Abstract

Air transportation is undergoing a rapid evolution globally with the introduction of Advanced Air Mobility (AAM) and with it comes novel challenges and opportunities for transforming aviation. As AAM operations introduce increasing heterogeneity in vehicle capabilities and density, increased levels of automation are likely necessary to achieve operational safety and efficiency goals. This paper focuses on one example where increased automation has been suggested. Autonomous operations will need contingency management systems that can monitor evolving risk across a span of interrelated (or interdependent) hazards and, if necessary, execute appropriate control interventions via supervised or automated decision making. Accommodating this complex environment may require automated functions (autonomy) that apply artificial intelligence (AI) techniques that can adapt and respond to a quickly changing environment. This paper explores the use of Deep Reinforcement Learning (DRL) which has shown promising performance in complex and high-dimensional environments where the objective can be constructed as a sequential decision-making problem. An extension of a prior formulation of the contingency management problem as a Markov Decision Process (MDP) is presented and uses a DRL framework to train agents that mitigate hazards present in the simulation environment. A comparison of these learning-based agents and classical techniques is presented in terms of their performance, verification difficulties, and development process.

Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility

TL;DR

The paper investigates how Deep Reinforcement Learning can support autonomous contingency management (CM) in Advanced Air Mobility (AAM) by formulating CM as an

and comparing DRL agents to a heuristic baseline within the AAM-Gym framework. It examines two DRL algorithms, D2MAV_A and SACD, across a comprehensive hazard- and energy-aware environment, highlighting the potential for DRL to improve safety and efficiency in high-density airspace. Key findings show DRL agents achieving higher reroute success and substantially lower loss-of-control events than heuristics, albeit with substantial training demands and safety-verification challenges. The work also introduces a scalable simulation framework for evaluating safety-critical AI in aviation and discusses practical considerations like catastrophic forgetting and the need for robust validation before deployment.

Abstract

Paper Structure (17 sections, 16 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 16 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Background
Problem Formulation
DRL Algorithms
Baseline Simulation and Heuristic Algorithm
Hazard Modeling
Aircraft Energy Modeling
Scenario Generation
Metrics
Algorithm Design
Curriculum Training
State Space
Reward Model
Experiment Setup
Results
...and 2 more sections

Figures (7)

Figure 1: Network utilized for evaluation with loss-of-control hazard to force the majority of nominal traffic to traverse through hazard. Vertiport locations displayed with red circles, and nodes defining network segments displayed with green circles. Colormap indicates the distribution of hazard severity within the hazard region (red, more severe; blue, less severe).*
Figure 2: Evaluations with full range of energy reserves.
Figure 3: Stressing evaluations with reduced energy reserves.
Figure 5: Curriculum T4 learning curves of SACD-A agent.
Figure 6: Curriculum T4 learning curves of D2MAV-A agent.
...and 2 more figures

Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility

TL;DR

Abstract

Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility

Authors

TL;DR

Abstract

Table of Contents

Figures (7)