Table of Contents
Fetching ...

Evaluating Robustness and Adaptability in Learning-Based Mission Planning for Active Debris Removal

Agni Bandyopadhyay, Günther Waxenegger-Wilfing

TL;DR

This paper investigates autonomous ADR planning in Low Earth Orbit under hard resource constraints by comparing three planners: a nominal PPO policy, a domain-randomized PPO policy, and a plain MCTS baseline. Through high-fidelity orbital simulations with refueling and randomized debris fields, it shows that nominal PPO performs best when conditions match training, domain-randomized PPO gains robustness under constraint shifts, and MCTS provides strong adaptability at the cost of heavy computation. The results highlight a fundamental trade-off between fast, onboard-inference policies and the adaptability of online planning, and suggest that hybrid approaches that blend training-time diversity with online planning could yield resilient ADR mission planners. The findings have practical implications for deploying ADR systems that must balance speed, reliability, and flexibility in evolving space environments.

Abstract

Autonomous mission planning for Active Debris Removal (ADR) must balance efficiency, adaptability, and strict feasibility constraints on fuel and mission duration. This work compares three planners for the constrained multi-debris rendezvous problem in Low Earth Orbit: a nominal Masked Proximal Policy Optimization (PPO) policy trained under fixed mission parameters, a domain-randomized Masked PPO policy trained across varying mission constraints for improved robustness, and a plain Monte Carlo Tree Search (MCTS) baseline. Evaluations are conducted in a high-fidelity orbital simulation with refueling, realistic transfer dynamics, and randomized debris fields across 300 test cases in nominal, reduced fuel, and reduced mission time scenarios. Results show that nominal PPO achieves top performance when conditions match training but degrades sharply under distributional shift, while domain-randomized PPO exhibits improved adaptability with only moderate loss in nominal performance. MCTS consistently handles constraint changes best due to online replanning but incurs orders-of-magnitude higher computation time. The findings underline a trade-off between the speed of learned policies and the adaptability of search-based methods, and suggest that combining training-time diversity with online planning could be a promising path for future resilient ADR mission planners.

Evaluating Robustness and Adaptability in Learning-Based Mission Planning for Active Debris Removal

TL;DR

This paper investigates autonomous ADR planning in Low Earth Orbit under hard resource constraints by comparing three planners: a nominal PPO policy, a domain-randomized PPO policy, and a plain MCTS baseline. Through high-fidelity orbital simulations with refueling and randomized debris fields, it shows that nominal PPO performs best when conditions match training, domain-randomized PPO gains robustness under constraint shifts, and MCTS provides strong adaptability at the cost of heavy computation. The results highlight a fundamental trade-off between fast, onboard-inference policies and the adaptability of online planning, and suggest that hybrid approaches that blend training-time diversity with online planning could yield resilient ADR mission planners. The findings have practical implications for deploying ADR systems that must balance speed, reliability, and flexibility in evolving space environments.

Abstract

Autonomous mission planning for Active Debris Removal (ADR) must balance efficiency, adaptability, and strict feasibility constraints on fuel and mission duration. This work compares three planners for the constrained multi-debris rendezvous problem in Low Earth Orbit: a nominal Masked Proximal Policy Optimization (PPO) policy trained under fixed mission parameters, a domain-randomized Masked PPO policy trained across varying mission constraints for improved robustness, and a plain Monte Carlo Tree Search (MCTS) baseline. Evaluations are conducted in a high-fidelity orbital simulation with refueling, realistic transfer dynamics, and randomized debris fields across 300 test cases in nominal, reduced fuel, and reduced mission time scenarios. Results show that nominal PPO achieves top performance when conditions match training but degrades sharply under distributional shift, while domain-randomized PPO exhibits improved adaptability with only moderate loss in nominal performance. MCTS consistently handles constraint changes best due to online replanning but incurs orders-of-magnitude higher computation time. The findings underline a trade-off between the speed of learned policies and the adaptability of search-based methods, and suggest that combining training-time diversity with online planning could be a promising path for future resilient ADR mission planners.
Paper Structure (15 sections, 2 equations, 6 figures, 1 table)

This paper contains 15 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the co-elliptic Hohmann transfer sequence, showing the initial and target orbits, transfer maneuvers, and the terminal safety ellipse maneuver at rendezvous.
  • Figure 2: Zoomed-in view of the terminal approach phase, highlighting the safety ellipse maneuver for controlled and safe rendezvous with the target debris.
  • Figure 3: Debris visited under nominal mission conditions (3 km/s dv, 7-day duration) for PPO (nominal), PPO (domain-randomized), and MCTS.
  • Figure 4: Debris visited under reduced mission duration (3 days) for PPO (nominal), PPO (domain-randomized), and MCTS.
  • Figure 5: Debris visited under reduced dv budget (1 km/s) for PPO (nominal), PPO (domain-randomized), and MCTS.
  • ...and 1 more figures