Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

Kyle Dunlap; Nathaniel Hamilton; Francisco Viramontes; Derrek Landauer; Evan Kain; Kerianne L. Hobbs

Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

Kyle Dunlap, Nathaniel Hamilton, Francisco Viramontes, Derrek Landauer, Evan Kain, Kerianne L. Hobbs

TL;DR

This study evaluates the real-time feasibility of reinforcement learning–based neural network controllers (NNCs) and run time assurance (RTA) filters for autonomous spacecraft inspection tasks. Using PPO-trained NNCs and ASIF-based RTA with control barrier functions, the authors benchmark timing on both Commercial Off-The-Shelf (COTS) NVIDIA boards and a radiation-tolerant Unibap iX10 platform. Results show that NNCs and most RTA variants compute safe and near-optimal actions in milliseconds, with some configurations approaching sub-millisecond performance on the iX10, and under a few tens of milliseconds on COTS hardware. The findings suggest that an NNC plus RTA pipeline can operate in real time aboard spacecraft, with eASIF generally preferred for speed and safety guarantees, and indicate areas for optimization and future testing on radiation-hardened platforms.

Abstract

As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured in real time. However, in order to use these algorithms on board a spacecraft, they must be able to run in real time on space grade processors, which are typically outdated and less capable than state-of-the-art equipment. In this paper, multiple RL-trained neural network controllers (NNCs) and RTA algorithms were tested on commercial-off-the-shelf (COTS) and radiation tolerant processors. The results show that all NNCs and most RTA algorithms can compute optimal and safe actions in well under 1 second with room for further optimization before deploying in the real world.

Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

TL;DR

Abstract

Paper Structure (22 sections, 19 equations, 12 figures, 8 tables)

This paper contains 22 sections, 19 equations, 12 figures, 8 tables.

Introduction
Background
Reinforcement Learning
Run Time Assurance
Space Hardware
Experimental Setup
Inspection Task
Dynamics
RL Environment Setup
Safety Constraints
Neural Network Controllers
Space Processors
COTS
Radiation Tolerant
Experiments
...and 7 more sections

Figures (12)

Figure 1: Feedback control system with RTA filter. Components with low safety confidence are outlined in red, and components with high safety confidence are outlined in blue.
Figure 2: Deputy spacecraft in relation to a chief spacecraft in Hill's Frame.
Figure 3: The NNC architecture for the "all sensors" variation of the inspection task.
Figure 4: NNC only (all sensors)
Figure 5: RTA only - AGX Orin
...and 7 more figures

Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

TL;DR

Abstract

Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (12)