Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning
Vincent Lam, Robin Chhabra
TL;DR
This work tackles safe debris-removal with a free-floating space manipulator by combining a TD3-based trajectory planner operating in SE(3) with a robust local controller. The approach employs a dual-objective, multi-critic TD3 framework and prioritized experience replay to jointly optimize target tracking and collision avoidance, while a Lie-group-based controller ensures stable, dexterous execution and singularity resilience. Key contributions include the two-critic per objective architecture, the separation of capture and obstacle states, and a PER strategy that accelerates convergence in 3D space. Experimental results in a 7-DOF simulated environment demonstrate rapid convergence and safe behavior in both obstacle-free and obstacle-rich scenarios, highlighting practical potential for debris-removal missions in space.
Abstract
The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.
