Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations

Minduli C. Wijayatunga; Richard Linares; Roberto Armellin

Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations

Minduli C. Wijayatunga, Richard Linares, Roberto Armellin

TL;DR

The paper tackles safety-critical spacecraft proximity operations under thrust limits and uncertainty by introducing learnable ICCBFs that parameterize the full class-$\mathcal{K}$ recursion and are tuned via meta-reinforcement learning. A differential-algebra–based margin is computed to guarantee forward invariance in time-sampled execution, enabling real-time QP-based safety enforcement. The authors compare MLP and RNN (LSTM) meta-policies across cruise control, docking, and 3D inspection tasks, showing substantial fuel savings and improved feasibility, with RNNs providing the strongest performance in complex, partially observed scenarios. This work advances practical safe autonomy for space missions by combining learnable barrier functions, efficient inter-sample margins, and memory-enabled adaptation to hidden parameters.

Abstract

Autonomous spacecraft inspection and docking missions require controllers that can guarantee safety under thrust constraints and uncertainty. Input-constrained control barrier functions (ICCBFs) provide a framework for safety certification under bounded actuation; however, conventional ICCBF formulations can be overly conservative and exhibit limited robustness to uncertainty, leading to high fuel consumption and reduced mission feasibility. This paper proposes a framework in which the full hierarchy of class-$\mathcal{K}$ functions defining the ICCBF recursion is parameterized and learned, enabling localized shaping of the safe set and reduced conservatism. A control margin is computed efficiently using differential algebra to enable the learned continuous-time ICCBFs to be implemented on time-sampled dynamical systems typical of spacecraft proximity operations. A meta-reinforcement learning scheme is developed to train a policy that generates ICCBF parameters over a distribution of hidden physical parameters and uncertainties, using both multilayer perceptron (MLP) and recurrent neural network (RNN) architectures. Simulation results on cruise control, spacecraft inspection, and docking scenarios demonstrate that the proposed approach maintains safety while reducing fuel consumption and improving feasibility relative to fixed class-$\mathcal{K}$ ICCBFs, with the RNN showing a particularly strong advantage in the more complex inspection case.

Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations

TL;DR

The paper tackles safety-critical spacecraft proximity operations under thrust limits and uncertainty by introducing learnable ICCBFs that parameterize the full class-

recursion and are tuned via meta-reinforcement learning. A differential-algebra–based margin is computed to guarantee forward invariance in time-sampled execution, enabling real-time QP-based safety enforcement. The authors compare MLP and RNN (LSTM) meta-policies across cruise control, docking, and 3D inspection tasks, showing substantial fuel savings and improved feasibility, with RNNs providing the strongest performance in complex, partially observed scenarios. This work advances practical safe autonomy for space missions by combining learnable barrier functions, efficient inter-sample margins, and memory-enabled adaptation to hidden parameters.

Abstract

functions defining the ICCBF recursion is parameterized and learned, enabling localized shaping of the safe set and reduced conservatism. A control margin is computed efficiently using differential algebra to enable the learned continuous-time ICCBFs to be implemented on time-sampled dynamical systems typical of spacecraft proximity operations. A meta-reinforcement learning scheme is developed to train a policy that generates ICCBF parameters over a distribution of hidden physical parameters and uncertainties, using both multilayer perceptron (MLP) and recurrent neural network (RNN) architectures. Simulation results on cruise control, spacecraft inspection, and docking scenarios demonstrate that the proposed approach maintains safety while reducing fuel consumption and improving feasibility relative to fixed class-

ICCBFs, with the RNN showing a particularly strong advantage in the more complex inspection case.

Paper Structure (30 sections, 37 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 37 equations, 12 figures, 4 tables, 1 algorithm.

Introduction
Theoretical Background
Problem Setup
Time-sampled Execution and Zero-Order Hold
Control Barrier Functions
Control Lyapunov Functions
Input-Constrained Control Barrier Functions
Meta-Reinforcement Learning
METHODOLOGY
ICCBF Recursion and Class-$\mathcal{K}$ Parameterization
Control Margin Computation
QP Formulations
META-RL FORMULATION
Network Architecture
Initial State and Hidden Parameter Distributions
...and 15 more sections

Figures (12)

Figure 1: Meta-RL network architecture. State $x_k$ is mapped through separate actor/critic MLP feature extractors and separate LSTMs, producing ICCBF parameter outputs $\boldsymbol{\theta}_k$ from the actor head and value estimates $V(x_k)$ from the critic head. PPO updates are performed using rollouts across tasks $\mathcal{M}_i \sim p(\mathcal{M})$.
Figure 2: Cruise Control MC parameter variations in the 5000 samples
Figure 3: Cruise control total thrust distributions
Figure 4: Cruise Control trajectories and CBF, CLF variation over time
Figure 5: Spacecraft Docking Problem
...and 7 more figures

Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations

TL;DR

Abstract

Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations

Authors

TL;DR

Abstract

Table of Contents

Figures (12)