Markov Decision Processes for Satellite Maneuver Planning and Collision Avoidance

William Kuhl; Jun Wang; Duncan Eddy; Mykel Kochenderfer

Markov Decision Processes for Satellite Maneuver Planning and Collision Avoidance

William Kuhl, Jun Wang, Duncan Eddy, Mykel Kochenderfer

TL;DR

This work models satellite collision avoidance as a Markov decision process to enable online, optimal maneuver planning for large LEO constellations under information updates and state uncertainty. It introduces a time-aware Monte Carlo Tree Search framework with limited and full-horizon variants, along with exploration heuristics, and evaluates against rule-based baselines in extensive simulations. The results show that MCTS, especially full-horizon with stochastic-depth, can reduce maneuver costs and maintain high safety across encounters, with horizon and exploration choices tuning the safety-cost trade-off. The approach promises practical benefits for onboard, real-time decision making with uncertain CDM updates, and suggests extensions to richer action spaces and incremental maneuvers for further fuel efficiency gains.

Abstract

This paper presents a decentralized, online planning approach for scalable maneuver planning for large constellations. While decentralized, rule-based strategies have facilitated efficient scaling, optimal decision-making algorithms for satellite maneuvers remain underexplored. As commercial satellite constellations grow, there are benefits of online maneuver planning, such as using real-time trajectory predictions to improve state knowledge, thereby reducing maneuver frequency and conserving fuel. We address this gap in the research by treating the satellite maneuver planning problem as a Markov decision process (MDP). This approach enables the generation of optimal maneuver policies online with low computational cost. This formulation is applied to the low Earth orbit collision avoidance problem, considering the problem of an active spacecraft deciding to maneuver to avoid a non-maneuverable object. We test the policies we generate in a simulated low Earth orbit environment, and compare the results to traditional rule-based collision avoidance techniques.

Markov Decision Processes for Satellite Maneuver Planning and Collision Avoidance

TL;DR

Abstract

Paper Structure (23 sections, 6 equations, 5 figures, 5 tables, 7 algorithms)

This paper contains 23 sections, 6 equations, 5 figures, 5 tables, 7 algorithms.

Introduction
Markov Decision Process Formulation
State Space
Action Space
Transition Function
Reward Function
Solution Methods
MCTS: Limited Horizon
Exploration Policies
UCB1
Stochastic Depth Heuristic
Baseline Methods
Experiments
Simulation Characteristics
Results
...and 8 more sections

Figures (5)

Figure 1: Simulated encounters between a satellite and piece of debris. The line at $P_c=e-5$ represents a common operator defined risk threshold. As the time until closest approach decreases $P_c$ is updated to reflect new state estimates.
Figure 2: Examples of safe encounters. This plot shows how the probability of collision is updated as the time until closest approach decreases if no maneuver is taken. A safe encounter is defined as when the probability of collision falls below some threshold as the time until closest approach goes to 0.
Figure 3: Examples of unsafe encounters. This plot shows how the probability of collision is updated as the time until closest approach decreases if no maneuver is taken. An unsafe encounter is defined as when the probability of collision rises above some threshold as the time until closest approach goes to 0.
Figure 4: Examples of unsafe encounters where $P_c<P_{c,\text{threshold}}$ when $t=8$h. This is an subset of unsafe encounters in which $P_c$ rises from below the safety threshold to above it between the last state update and the time of closest approach. Consequently, not all rule-based baselines successfully mitigate risk at the final state update.
Figure 5: Each maneuvering algorithm is plotted with rule-based methods in blue and MCTS methods in red. The horizontal axis represents the average $\Delta v$ spent per safe encounter, while the vertical axis is the average $\Delta v$ spent per unsafe encounter. The rule-based method with $t_\text{cutoff}=72$ hours is the theoretical lower bound on $\Delta v$ for unsafe encounters.

Markov Decision Processes for Satellite Maneuver Planning and Collision Avoidance

TL;DR

Abstract

Markov Decision Processes for Satellite Maneuver Planning and Collision Avoidance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)