MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement Learning
Ayesha Siddika Nipu, Siming Liu, Anthony Harris
TL;DR
This work tackles the challenges of multi-agent reinforcement learning in complex, mixed-cooperation/competition settings by presenting MAIDCRL, a semi-centralized Dense-CNN RL approach that leverages Agent Influence Maps aggregated into a MAIM to drive coordinated control in StarCraft II SMAC scenarios. By extending the prior MAIDRL with a CNN-based feature extractor and a DenseNet architecture, MAIDCRL captures spatial structure in global information to improve learning efficiency. Empirical results across homogeneous and heterogeneous SMAC maps show faster convergence and higher performance, along with stronger robustness across seeds and clearer learned coordination strategies. The approach demonstrates that integrating AIM-based global context with spatial feature learning can substantially enhance multi-agent coordination in complex domains and may generalize to other MAS tasks.
Abstract
Distributed decision-making in multi-agent systems presents difficult challenges for interactive behavior learning in both cooperative and competitive systems. To mitigate this complexity, MAIDRL presents a semi-centralized Dense Reinforcement Learning algorithm enhanced by agent influence maps (AIMs), for learning effective multi-agent control on StarCraft Multi-Agent Challenge (SMAC) scenarios. In this paper, we extend the DenseNet in MAIDRL and introduce semi-centralized Multi-Agent Dense-CNN Reinforcement Learning, MAIDCRL, by incorporating convolutional layers into the deep model architecture, and evaluate the performance on both homogeneous and heterogeneous scenarios. The results show that the CNN-enabled MAIDCRL significantly improved the learning performance and achieved a faster learning rate compared to the existing MAIDRL, especially on more complicated heterogeneous SMAC scenarios. We further investigate the stability and robustness of our model. The statistics reflect that our model not only achieves higher winning rate in all the given scenarios but also boosts the agent's learning process in fine-grained decision-making.
