Table of Contents
Fetching ...

M2I2: Learning Efficient Multi-Agent Communication via Masked State Modeling and Intention Inference

Chuxiong Sun, Peng He, Qirui Ji, Zehua Zang, Jiangmeng Li, Rui Wang, Wei Wang

TL;DR

M2I2 targets the core MARL challenge of integrating shared information under partial observability. It combines masked state modeling with an inverse model to infer teammates’ joint actions, and introduces a Dimensional Rational Network (DRN) trained via meta-learning to adaptively weight observation dimensions; an importance-based masking strategy further enhances communication efficiency. Self-supervised objectives for state reconstruction and joint-action prediction guide representation learning, improving decision quality during decentralized execution. Empirical results across Hallway, Predator-Prey, and SMAC benchmarks show strong performance, superior efficiency at reduced communication, and robust generalization to other MARL baselines and sight-range settings.

Abstract

Communication is essential in coordinating the behaviors of multiple agents. However, existing methods primarily emphasize content, timing, and partners for information sharing, often neglecting the critical aspect of integrating shared information. This gap can significantly impact agents' ability to understand and respond to complex, uncertain interactions, thus affecting overall communication efficiency. To address this issue, we introduce M2I2, a novel framework designed to enhance the agents' capabilities to assimilate and utilize received information effectively. M2I2 equips agents with advanced capabilities for masked state modeling and joint-action prediction, enriching their perception of environmental uncertainties and facilitating the anticipation of teammates' intentions. This approach ensures that agents are furnished with both comprehensive and relevant information, bolstering more informed and synergistic behaviors. Moreover, we propose a Dimensional Rational Network, innovatively trained via a meta-learning paradigm, to identify the importance of dimensional pieces of information, evaluating their contributions to decision-making and auxiliary tasks. Then, we implement an importance-based heuristic for selective information masking and sharing. This strategy optimizes the efficiency of masked state modeling and the rationale behind information sharing. We evaluate M2I2 across diverse multi-agent tasks, the results demonstrate its superior performance, efficiency, and generalization capabilities, over existing state-of-the-art methods in various complex scenarios.

M2I2: Learning Efficient Multi-Agent Communication via Masked State Modeling and Intention Inference

TL;DR

M2I2 targets the core MARL challenge of integrating shared information under partial observability. It combines masked state modeling with an inverse model to infer teammates’ joint actions, and introduces a Dimensional Rational Network (DRN) trained via meta-learning to adaptively weight observation dimensions; an importance-based masking strategy further enhances communication efficiency. Self-supervised objectives for state reconstruction and joint-action prediction guide representation learning, improving decision quality during decentralized execution. Empirical results across Hallway, Predator-Prey, and SMAC benchmarks show strong performance, superior efficiency at reduced communication, and robust generalization to other MARL baselines and sight-range settings.

Abstract

Communication is essential in coordinating the behaviors of multiple agents. However, existing methods primarily emphasize content, timing, and partners for information sharing, often neglecting the critical aspect of integrating shared information. This gap can significantly impact agents' ability to understand and respond to complex, uncertain interactions, thus affecting overall communication efficiency. To address this issue, we introduce M2I2, a novel framework designed to enhance the agents' capabilities to assimilate and utilize received information effectively. M2I2 equips agents with advanced capabilities for masked state modeling and joint-action prediction, enriching their perception of environmental uncertainties and facilitating the anticipation of teammates' intentions. This approach ensures that agents are furnished with both comprehensive and relevant information, bolstering more informed and synergistic behaviors. Moreover, we propose a Dimensional Rational Network, innovatively trained via a meta-learning paradigm, to identify the importance of dimensional pieces of information, evaluating their contributions to decision-making and auxiliary tasks. Then, we implement an importance-based heuristic for selective information masking and sharing. This strategy optimizes the efficiency of masked state modeling and the rationale behind information sharing. We evaluate M2I2 across diverse multi-agent tasks, the results demonstrate its superior performance, efficiency, and generalization capabilities, over existing state-of-the-art methods in various complex scenarios.
Paper Structure (27 sections, 10 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Framework of M2I2. Similar to other CTDE approaches in MARL, M2I2 only leverages global states and joint actions during centralized training phase. However, M2I2 distinguishes itself through its self-supervised auxiliary tasks. These tasks enable agents to develop representations from received messages, enhancing their ability to comprehend global states and infer teammates' joint actions. This capability becomes particularly valuable during the decentralized execution phase, where agents must operate based on limited observations.
  • Figure 2: Performance on multiple benchmarks.
  • Figure 3: Ablation.
  • Figure 4: Generation.
  • Figure 5: Multiple environments considered in our experiments.
  • ...and 5 more figures