Table of Contents
Fetching ...

Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning

Lunjun Liu, Weilai Jiang, Yaonan Wang

TL;DR

This work targets two core challenges in CTDE-based MARL: autonomously filtering information relevant to cooperation and achieving effective cooperation under communication limitations. It introduces SICA, a three-block framework—Selection, Communication, and Regeneration—that enables adaptive information selection and tacit learning, gradually transitioning from centralized to decentralized execution. By integrating with QMIX-style value decomposition and introducing an alignment loss to progressively reconstruct true information, SICA achieves superior performance on SMAC, SMACv2, and GRF, outperforming both traditional CTDE methods and explicit communication baselines. The results demonstrate that selective information processing and gradual information regeneration can boost coordination in complex multi-agent tasks while reducing reliance on explicit inter-agent communication, with strong plug-and-play potential for existing MARL algorithms.

Abstract

In multi-agent reinforcement learning (MARL), the centralized training with decentralized execution (CTDE) framework has gained widespread adoption due to its strong performance. However, the further development of CTDE faces two key challenges. First, agents struggle to autonomously assess the relevance of input information for cooperative tasks, impairing their decision-making abilities. Second, in communication-limited scenarios with partial observability, agents are unable to access global information, restricting their ability to collaborate effectively from a global perspective. To address these challenges, we introduce a novel cooperative MARL framework based on information selection and tacit learning. In this framework, agents gradually develop implicit coordination during training, enabling them to infer the cooperative behavior of others in a discrete space without communication, relying solely on local information. Moreover, we integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes, thereby enhancing their decision-making capabilities. Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms, leading to significant performance improvements.

Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning

TL;DR

This work targets two core challenges in CTDE-based MARL: autonomously filtering information relevant to cooperation and achieving effective cooperation under communication limitations. It introduces SICA, a three-block framework—Selection, Communication, and Regeneration—that enables adaptive information selection and tacit learning, gradually transitioning from centralized to decentralized execution. By integrating with QMIX-style value decomposition and introducing an alignment loss to progressively reconstruct true information, SICA achieves superior performance on SMAC, SMACv2, and GRF, outperforming both traditional CTDE methods and explicit communication baselines. The results demonstrate that selective information processing and gradual information regeneration can boost coordination in complex multi-agent tasks while reducing reliance on explicit inter-agent communication, with strong plug-and-play potential for existing MARL algorithms.

Abstract

In multi-agent reinforcement learning (MARL), the centralized training with decentralized execution (CTDE) framework has gained widespread adoption due to its strong performance. However, the further development of CTDE faces two key challenges. First, agents struggle to autonomously assess the relevance of input information for cooperative tasks, impairing their decision-making abilities. Second, in communication-limited scenarios with partial observability, agents are unable to access global information, restricting their ability to collaborate effectively from a global perspective. To address these challenges, we introduce a novel cooperative MARL framework based on information selection and tacit learning. In this framework, agents gradually develop implicit coordination during training, enabling them to infer the cooperative behavior of others in a discrete space without communication, relying solely on local information. Moreover, we integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes, thereby enhancing their decision-making capabilities. Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms, leading to significant performance improvements.

Paper Structure

This paper contains 13 sections, 14 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: A case study in selection mechanism. The colored sections signify the information that the agent elects to remember, whereas the white sections denote the information that the agent chooses to ignore.
  • Figure 2: The overall framework of SICA. We illustrate the network architecture using the example of the $Nth$ agent. The overall architecture comprises a mixing network and agent networks ($left$). Details of the agent network include the Selection Block, Communication Block and Regeneration Block ($middle$). The Selection Block ($right$)
  • Figure 3: Performance comparison between SICA and baselines on SMAC.
  • Figure 4: Performance comparison between SICA and baselines on SMACv2.
  • Figure 5: Learning curves with different numbers of agents in SMACv2 $protoss$.
  • ...and 5 more figures