Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

Agni Bandyopadhyay; Gunther Waxenegger-Wilfing

Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

Agni Bandyopadhyay, Gunther Waxenegger-Wilfing

TL;DR

The paper tackles multi-target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a unified co-elliptic maneuver framework that integrates Hohmann transfers, safety ellipse proximal approaches, and explicit refueling logic. It compares a Greedy heuristic, Monte Carlo Tree Search (MCTS), and a Masked PPO reinforcement learning agent, all under the same orbital dynamics and resource constraints, demonstrating that the RL approach achieves the best overall balance of mission efficiency and computational feasibility. Across 100 randomized debris fields with a $ ext{ΔV}$ budget of $3$ km/s and a 7-day horizon, the Masked PPO policy visits about $29$–$32$ debris (vs. $15$–$18$ for Greedy and $25$–$29$ for MCTS) while maintaining runtimes of about 1–2 seconds, whereas MCTS can require $10^3$–$10^4$ seconds. These results highlight the potential of modern RL methods for scalable, safe, and resource-efficient ADR planning in realistic space mission settings.

Abstract

This paper addresses the challenge of multi target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a unified coelliptic maneuver framework that combines Hohmann transfers, safety ellipse proximity operations, and explicit refueling logic. We benchmark three distinct planning algorithms Greedy heuristic, Monte Carlo Tree Search (MCTS), and deep reinforcement learning (RL) using Masked Proximal Policy Optimization (PPO) within a realistic orbital simulation environment featuring randomized debris fields, keep out zones, and delta V constraints. Experimental results over 100 test scenarios demonstrate that Masked PPO achieves superior mission efficiency and computational performance, visiting up to twice as many debris as Greedy and significantly outperforming MCTS in runtime. These findings underscore the promise of modern RL methods for scalable, safe, and resource efficient space mission planning, paving the way for future advancements in ADR autonomy.

Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

TL;DR

budget of

km/s and a 7-day horizon, the Masked PPO policy visits about

–

debris (vs.

–

for Greedy and

–

for MCTS) while maintaining runtimes of about 1–2 seconds, whereas MCTS can require

–

seconds. These results highlight the potential of modern RL methods for scalable, safe, and resource-efficient ADR planning in realistic space mission settings.

Abstract

Paper Structure (17 sections, 3 equations, 4 figures)

This paper contains 17 sections, 3 equations, 4 figures.

Introduction
Related Work
Mission Model
Operational Scenario
Transfer and Maneuver Model
Refueling and $\Delta V$ Budgeting
Mission Constraints
Planning Algorithms
Greedy Heuristic
Monte Carlo Tree Search (MCTS)
Reinforcement Learning Agent
Unified Evaluation Framework
Results and Discussion
Debris Removal Efficiency
Computation Time
...and 2 more sections

Figures (4)

Figure 1: Overview of the co-elliptic Hohmann transfer sequence, showing the initial and target orbits, transfer maneuvers, and the terminal safety ellipse maneuver at rendezvous.
Figure 2: Zoomed-in view of the terminal approach phase, highlighting the safety ellipse maneuver for controlled and safe rendezvous with the target debris.
Figure 3: Number of debris objects visited per episode by each algorithm across 100 randomized test cases.
Figure 4: Computation time (log scale) for each algorithm per test case. Masked PPO and Greedy achieve fast runtimes, while MCTS is orders of magnitude slower.

Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

TL;DR

Abstract

Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)