Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning
Lunjun Liu, Weilai Jiang, Yaonan Wang
TL;DR
This work targets two core challenges in CTDE-based MARL: autonomously filtering information relevant to cooperation and achieving effective cooperation under communication limitations. It introduces SICA, a three-block framework—Selection, Communication, and Regeneration—that enables adaptive information selection and tacit learning, gradually transitioning from centralized to decentralized execution. By integrating with QMIX-style value decomposition and introducing an alignment loss to progressively reconstruct true information, SICA achieves superior performance on SMAC, SMACv2, and GRF, outperforming both traditional CTDE methods and explicit communication baselines. The results demonstrate that selective information processing and gradual information regeneration can boost coordination in complex multi-agent tasks while reducing reliance on explicit inter-agent communication, with strong plug-and-play potential for existing MARL algorithms.
Abstract
In multi-agent reinforcement learning (MARL), the centralized training with decentralized execution (CTDE) framework has gained widespread adoption due to its strong performance. However, the further development of CTDE faces two key challenges. First, agents struggle to autonomously assess the relevance of input information for cooperative tasks, impairing their decision-making abilities. Second, in communication-limited scenarios with partial observability, agents are unable to access global information, restricting their ability to collaborate effectively from a global perspective. To address these challenges, we introduce a novel cooperative MARL framework based on information selection and tacit learning. In this framework, agents gradually develop implicit coordination during training, enabling them to infer the cooperative behavior of others in a discrete space without communication, relying solely on local information. Moreover, we integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes, thereby enhancing their decision-making capabilities. Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms, leading to significant performance improvements.
