A Markov Decision Process Framework for Early Maneuver Decisions in Satellite Collision Avoidance
Authors
Francesca Ferrara, Lander W. Schillinger Arana, Florian Dörfler, Sarah H. Q. Li
Abstract
We develop a Markov decision process (MDP) framework to autonomously make guidance decisions for satellite collision avoidance maneuver (CAM) and a reinforcement learning policy gradient (RL-PG) algorithm to enable direct optimization of guidance policy using historic CAM data. In addition to maintaining acceptable collision risks, this approach seeks to minimize the average propellant consumption of CAMs by making early maneuver decisions. We model CAM as a continuous state, discrete action and finite horizon MDP, where the critical decision is determining when to initiate the maneuver. The MDP models decision rewards using analytical models of collision risk, propellant consumption, and transit orbit geometry. By deciding to maneuver earlier than conventional methods, the Markov policy effectively favors CAMs that achieve comparable rates of collision risk reduction while consuming less propellant. Using historical data of tracked conjunction events, we verify this framework and conduct an extensive parameter-sensitivity study. When evaluated on synthetic conjunction events, the trained policy consumes significantly less propellant overall and per maneuver in comparison to a conventional cut-off policy that initiates maneuvers 24 hours before the time of closest approach (TCA). On historical conjunction events, the trained policy consumes more propellant overall but consumes less propellant per maneuver. For both historical and synthetic conjunction events, the trained policy is slightly more conservative in identifying conjunctions events that warrant CAMs in comparison to cutoff policies.