Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network
Ye Zhang, Linyue Chu, Letian Xu, Kangtong Mo, Zhengjian Kang, Xingyu Zhang
TL;DR
The paper tackles autonomous coordination for multi-robot orbital debris capture by proposing a deep reinforcement learning approach that trains a DNN policy in a MuJoCo-based environment to maximize the effective object transfer rate $R$. It introduces a dual-arm force/torque decomposition and a differentiable Newton-Euler dynamics model with learnable, physically constrained parameters (e.g., $m_i=\exp(\alpha_i)$, $f_{ci}=\exp(\beta_i)$) to enable end-to-end RL for coordinated debris manipulation. Key contributions include integrating a differentiable dynamics model with RL, and validating the method against a heuristic baseline, demonstrating up to 16% higher retrieval efficiency and real-world viability via two-robot hardware experiments. This work advances scalable, autonomous debris management in space by combining physics-informed dynamics with data-driven coordination policies.
Abstract
In this paper, we present an advanced strategy for the coordinated control of a multi-agent aerospace system, utilizing Deep Neural Networks (DNNs) within a reinforcement learning framework. Our approach centers on optimizing autonomous task assignment to enhance the system's operational efficiency in object relocation tasks, framed as an aerospace-oriented pick-and-place scenario. By modeling this coordination challenge within a MuJoCo environment, we employ a deep reinforcement learning algorithm to train a DNN-based policy to maximize task completion rates across the multi-agent system. The objective function is explicitly designed to maximize effective object transfer rates, leveraging neural network capabilities to handle complex state and action spaces in high-dimensional aerospace environments. Through extensive simulation, we benchmark the proposed method against a heuristic combinatorial approach rooted in game-theoretic principles, demonstrating a marked performance improvement, with the trained policy achieving up to 16\% higher task efficiency. Experimental validation is conducted on a multi-agent hardware setup to substantiate the efficacy of our approach in a real-world aerospace scenario.
