A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation
Yang Lv, Jinlong Lei, Peng Yi
TL;DR
This work tackles dynamic task allocation for robot swarms under partial observability by formulating the problem as a Decentralized POMDP (Dec_POMDP) and proposing LIA_MADDPG, a CTDE-based MARL approach that uses a Local Information Aggregation (LIA) module to focus learning on nearby, relevant agents. Key innovations include a shared policy and value network trained offline, plus an extended Q-function $G$ that aggregates locally observed information via distance-based weights $w_{i,k}^t$ through the operator $\varphi_i$, and a online strategy-improvement mechanism with deviation probability $\delta_{i,g_i^t}$ to adapt to changing conditions. The approach demonstrates superior convergence speed, stability, and scalability compared to six baselines (e.g., IND_DQN, IND_DDPG, MAAC, QMIX, LINDA_QMIX, MAPPO) across varying swarm sizes, and extends to high-fidelity physics simulations for practical relevance. The results indicate substantial gains in Normalized Average Total Utility (NATU), reduced Normalized Average Time Cost (NATC), and higher Dominance Ratio (DR), supporting the framework’s potential for real-world, dynamic robot swarm coordination with enhanced local collaboration and adaptive strategy execution.
Abstract
In this paper, we explore how to optimize task allocation for robot swarms in dynamic environments, emphasizing the necessity of formulating robust, flexible, and scalable strategies for robot cooperation. We introduce a novel framework using a decentralized partially observable Markov decision process (Dec_POMDP), specifically designed for distributed robot swarm networks. At the core of our methodology is the Local Information Aggregation Multi-Agent Deep Deterministic Policy Gradient (LIA_MADDPG) algorithm, which merges centralized training with distributed execution (CTDE). During the centralized training phase, a local information aggregation (LIA) module is meticulously designed to gather critical data from neighboring robots, enhancing decision-making efficiency. In the distributed execution phase, a strategy improvement method is proposed to dynamically adjust task allocation based on changing and partially observable environmental conditions. Our empirical evaluations show that the LIA module can be seamlessly integrated into various CTDE-based MARL methods, significantly enhancing their performance. Additionally, by comparing LIA_MADDPG with six conventional reinforcement learning algorithms and a heuristic algorithm, we demonstrate its superior scalability, rapid adaptation to environmental changes, and ability to maintain both stability and convergence speed. These results underscore LIA_MADDPG's outstanding performance and its potential to significantly improve dynamic task allocation in robot swarms through enhanced local collaboration and adaptive strategy execution.
