Tradeoffs When Considering Deep Reinforcement Learning for Contingency Management in Advanced Air Mobility
Luis E. Alvarez, Marc W. Brittain, Steven D. Young
TL;DR
The paper investigates how Deep Reinforcement Learning can support autonomous contingency management (CM) in Advanced Air Mobility (AAM) by formulating CM as an $MDP$ and comparing DRL agents to a heuristic baseline within the AAM-Gym framework. It examines two DRL algorithms, D2MAV_A and SACD, across a comprehensive hazard- and energy-aware environment, highlighting the potential for DRL to improve safety and efficiency in high-density airspace. Key findings show DRL agents achieving higher reroute success and substantially lower loss-of-control events than heuristics, albeit with substantial training demands and safety-verification challenges. The work also introduces a scalable simulation framework for evaluating safety-critical AI in aviation and discusses practical considerations like catastrophic forgetting and the need for robust validation before deployment.
Abstract
Air transportation is undergoing a rapid evolution globally with the introduction of Advanced Air Mobility (AAM) and with it comes novel challenges and opportunities for transforming aviation. As AAM operations introduce increasing heterogeneity in vehicle capabilities and density, increased levels of automation are likely necessary to achieve operational safety and efficiency goals. This paper focuses on one example where increased automation has been suggested. Autonomous operations will need contingency management systems that can monitor evolving risk across a span of interrelated (or interdependent) hazards and, if necessary, execute appropriate control interventions via supervised or automated decision making. Accommodating this complex environment may require automated functions (autonomy) that apply artificial intelligence (AI) techniques that can adapt and respond to a quickly changing environment. This paper explores the use of Deep Reinforcement Learning (DRL) which has shown promising performance in complex and high-dimensional environments where the objective can be constructed as a sequential decision-making problem. An extension of a prior formulation of the contingency management problem as a Markov Decision Process (MDP) is presented and uses a DRL framework to train agents that mitigate hazards present in the simulation environment. A comparison of these learning-based agents and classical techniques is presented in terms of their performance, verification difficulties, and development process.
