Trajectory Planning for Teleoperated Space Manipulators Using Deep Reinforcement Learning
Bo Xia, Xianru Tian, Bo Yuan, Zhiheng Li, Bin Liang, Xueqian Wang
TL;DR
The paper tackles trajectory planning for teleoperated space manipulators facing time-delay in communications, which can disrupt the Markov property and degrade control performance. It presents a DRL framework that couples a Delay Information Processing (DIP) module with a Soft Actor-Critic decision module, incorporating three strategies—Mapping, Prediction, and State Augmentation—to handle delayed observations. Experiments in MuJoCo across four base/target configurations show State Augmentation yields the best robustness and efficiency, with SACAS achieving the highest and most stable success rates under varying $d_{RTD}$ and delay randomness. This work reduces reliance on precise dynamic models and demonstrates practical viability for time-delayed teleoperation, improving stability and performance in space robotic operations.
Abstract
Trajectory planning for teleoperated space manipulators involves challenges such as accurately modeling system dynamics, particularly in free-floating modes with non-holonomic constraints, and managing time delays that increase model uncertainty and affect control precision. Traditional teleoperation methods rely on precise dynamic models requiring complex parameter identification and calibration, while data-driven methods do not require prior knowledge but struggle with time delays. A novel framework utilizing deep reinforcement learning (DRL) is introduced to address these challenges. The framework incorporates three methods: Mapping, Prediction, and State Augmentation, to handle delays when delayed state information is received at the master end. The Soft Actor Critic (SAC) algorithm processes the state information to compute the next action, which is then sent to the remote manipulator for environmental interaction. Four environments are constructed using the MuJoCo simulation platform to account for variations in base and target fixation: fixed base and target, fixed base with rotated target, free-floating base with fixed target, and free-floating base with rotated target. Extensive experiments with both constant and random delays are conducted to evaluate the proposed methods. Results demonstrate that all three methods effectively address trajectory planning challenges, with State Augmentation showing superior efficiency and robustness.
