Table of Contents
Fetching ...

Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning

Vincent Lam, Robin Chhabra

TL;DR

This work tackles safe debris-removal with a free-floating space manipulator by combining a TD3-based trajectory planner operating in SE(3) with a robust local controller. The approach employs a dual-objective, multi-critic TD3 framework and prioritized experience replay to jointly optimize target tracking and collision avoidance, while a Lie-group-based controller ensures stable, dexterous execution and singularity resilience. Key contributions include the two-critic per objective architecture, the separation of capture and obstacle states, and a PER strategy that accelerates convergence in 3D space. Experimental results in a 7-DOF simulated environment demonstrate rapid convergence and safe behavior in both obstacle-free and obstacle-rich scenarios, highlighting practical potential for debris-removal missions in space.

Abstract

The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.

Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning

TL;DR

This work tackles safe debris-removal with a free-floating space manipulator by combining a TD3-based trajectory planner operating in SE(3) with a robust local controller. The approach employs a dual-objective, multi-critic TD3 framework and prioritized experience replay to jointly optimize target tracking and collision avoidance, while a Lie-group-based controller ensures stable, dexterous execution and singularity resilience. Key contributions include the two-critic per objective architecture, the separation of capture and obstacle states, and a PER strategy that accelerates convergence in 3D space. Experimental results in a 7-DOF simulated environment demonstrate rapid convergence and safe behavior in both obstacle-free and obstacle-rich scenarios, highlighting practical potential for debris-removal missions in space.

Abstract

The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.

Paper Structure

This paper contains 17 sections, 42 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Simulation Block Diagram
  • Figure 2: Task 1 Simulation Environment
  • Figure 3: Task 2 Simulation Environment
  • Figure 4: Task 1 Capture Reward Graph
  • Figure 5: Task 2 Capture Reward Graph
  • ...and 1 more figures

Theorems & Definitions (1)

  • Remark 1