Table of Contents
Fetching ...

SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot

Wenbo Zhao, Shengjie Wang, Yixuan Fan, Yang Gao, Tao Zhang

Abstract

Space robots have played a critical role in autonomous maintenance and space junk removal. Multi-arm space robots can efficiently complete the target capture and base reorientation tasks due to their flexibility and the collaborative capabilities between the arms. However, the complex coupling properties arising from both the multiple arms and the free-floating base present challenges to the motion planning problems of multi-arm space robots. We observe that the octopus elegantly achieves similar goals when grabbing prey and escaping from danger. Inspired by the distributed control of octopuses' limbs, we develop a multi-level decentralized motion planning framework to manage the movement of different arms of space robots. This motion planning framework integrates naturally with the multi-agent reinforcement learning (MARL) paradigm. The results indicate that our method outperforms the previous method (centralized training). Leveraging the flexibility of the decentralized framework, we reassemble policies trained for different tasks, enabling the space robot to complete trajectory planning tasks while adjusting the base attitude without further learning. Furthermore, our experiments confirm the superior robustness of our method in the face of external disturbances, changing base masses, and even the failure of one arm.

SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot

Abstract

Space robots have played a critical role in autonomous maintenance and space junk removal. Multi-arm space robots can efficiently complete the target capture and base reorientation tasks due to their flexibility and the collaborative capabilities between the arms. However, the complex coupling properties arising from both the multiple arms and the free-floating base present challenges to the motion planning problems of multi-arm space robots. We observe that the octopus elegantly achieves similar goals when grabbing prey and escaping from danger. Inspired by the distributed control of octopuses' limbs, we develop a multi-level decentralized motion planning framework to manage the movement of different arms of space robots. This motion planning framework integrates naturally with the multi-agent reinforcement learning (MARL) paradigm. The results indicate that our method outperforms the previous method (centralized training). Leveraging the flexibility of the decentralized framework, we reassemble policies trained for different tasks, enabling the space robot to complete trajectory planning tasks while adjusting the base attitude without further learning. Furthermore, our experiments confirm the superior robustness of our method in the face of external disturbances, changing base masses, and even the failure of one arm.
Paper Structure (25 sections, 10 equations, 9 figures, 1 table)

This paper contains 25 sections, 10 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The similarities between space robots and octopuses lie in their environments (zero-gravity outer space / underwater world), configurations (multiple robotic arms / multiple tentacles), and tasks (target capture / hunting). Inspired by the distributed brains of octopuses, we adopt a distributed control framework for space robots, in which each robotic arm learns its own strategy hierarchically for different tasks.
  • Figure 2: (a) Simulation environment of the four-arm space robot. (b) In the trajectory planning task, the target position is sampled in a 0.3 $\times$ 0.3 $\times$ 0.3 $\mathrm{m}^3$ cube in front of each arm with randomly sampled desired orientation.
  • Figure 3: The agent division methodology comprises three levels: the single-arm level, the multi-arm level, and the task level. To enhance the capabilities of the space robot, at the task level we can assign various tasks, such as trajectory planning and base reorientation, to the multiple arms of the space robot. The six joints of each arm are controlled by two controllers to achieve the desired position and orientation of the end-effector respectively.
  • Figure 4: Average performance for MAPPO, PPO and MADDPG over three seeds; the x-axis is training iteration. The rewards for all agents are added together in MAPPO and MADDPG in the trajectory planning task to make comparison with PPO. MAPPO outperforms the other two algorithms in both trajectory planning and base reorientation tasks.
  • Figure 5: The steady state error of the end-effector position and orientation and the base attitude of different algotithms. The results are obtained under 30 random seeds for each task.
  • ...and 4 more figures