Safe and Reliable Training of Learning-Based Aerospace Controllers

Udayan Mandal; Guy Amir; Haoze Wu; Ieva Daukantas; Fletcher Lee Newell; Umberto Ravaioli; Baoluo Meng; Michael Durling; Kerianne Hobbs; Milan Ganai; Tobey Shim; Guy Katz; Clark Barrett

Safe and Reliable Training of Learning-Based Aerospace Controllers

Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto Ravaioli, Baoluo Meng, Michael Durling, Kerianne Hobbs, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

TL;DR

The paper tackles the challenge of safely deploying learning-based aerospace controllers by integrating verification into the DRL training loop (design-for-verification) and by applying formal tools such as $k$-induction, Neural Lyapunov Barrier certificates, and reachability methods to a 2D spacecraft docking benchmark. It demonstrates a scalable $k$-induction framework to prove liveness, aided by a verification-friendly DNN setup and a Manhattan-distance based ranking invariant, and shows that CERTIFICATES can provide safety guarantees in parallel. The study also evaluates alternative verification approaches, including RWA certificates and reachability analyses, highlighting where these methods succeed or fail on the benchmark. Together, the work advances practical safety guarantees for DRL in avionics by combining training adaptations with formal verification methods, and it outlines future directions for scaling to more complex aerospace systems.

Abstract

In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of DRL controllers, which can help ensure their safe behavior. We showcase a design-for-verification approach utilizing k-induction and demonstrate its use in verifying liveness properties. In addition, we also give a brief overview of neural Lyapunov Barrier certificates and summarize their capabilities on a case study. Finally, we describe several other novel reachability-based approaches which, despite failing to provide guarantees of interest, could be effective for verification of other DRL systems, and could be of further interest to the community.

Safe and Reliable Training of Learning-Based Aerospace Controllers

TL;DR

-induction, Neural Lyapunov Barrier certificates, and reachability methods to a 2D spacecraft docking benchmark. It demonstrates a scalable

-induction framework to prove liveness, aided by a verification-friendly DNN setup and a Manhattan-distance based ranking invariant, and shows that CERTIFICATES can provide safety guarantees in parallel. The study also evaluates alternative verification approaches, including RWA certificates and reachability analyses, highlighting where these methods succeed or fail on the benchmark. Together, the work advances practical safety guarantees for DRL in avionics by combining training adaptations with formal verification methods, and it outlines future directions for scaling to more complex aerospace systems.

Abstract

Paper Structure (49 sections, 2 theorems, 22 equations, 3 figures, 2 algorithms)

This paper contains 49 sections, 2 theorems, 22 equations, 3 figures, 2 algorithms.

Introduction
Preliminaries and Related Work
Safety and Liveness
Safety.
Liveness.
DNNs, DNN Verification, and Dynamical Systems.
Deep Learning.
DNN Verification.
Discrete-Time Dynamical Systems.
Control Lyapunov Barrier Functions
Lyapunov Functions.
Barrier Functions.
Control Lyapunov Barrier Functions.
Reach-while-Avoid Tasks.
Other Verification Approaches
...and 34 more sections

Key Result

Proposition 1

If property eq:ind holds (for some $k$) for every state, then eventually the spacecraft will be moving towards the goal (i.e., the $L^1$ norm of the position will decrease).

Figures (3)

Figure 1: Design for Verification: An initial controller trajectory compared to a final controller trajectory, with the same initial state. The final controller has a more direct trajectory which is more conducive to verification via $k$-induction.
Figure 2: Grid reachability, with a cell navigating towards the docking region (in green)
Figure 3: Spurious trajectory with grid reachability

Theorems & Definitions (4)

definition 1: Reach-while-Avoid Task
Proposition 1
definition 2
Lemma 1

Safe and Reliable Training of Learning-Based Aerospace Controllers

TL;DR

Abstract

Safe and Reliable Training of Learning-Based Aerospace Controllers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)