Table of Contents
Fetching ...

MetaMind: General and Cognitive World Models in Multi-Agent Systems by Meta-Theory of Mind

Lingyi Wang, Rashed Shelim, Walid Saad, Naren Ramakrishna

TL;DR

MetaMind, a general and cognitive world model for multi-agent systems that leverages a novel meta-theory of mind (Meta-ToM) framework, is proposed and can achieve superior task performance and outperform baselines in few-shot multi-agent generalization.

Abstract

A major challenge for world models in multi-agent systems is to understand interdependent agent dynamics, predict interactive multi-agent trajectories, and plan over long horizons with collective awareness, without centralized supervision or explicit communication. In this paper, MetaMind, a general and cognitive world model for multi-agent systems that leverages a novel meta-theory of mind (Meta-ToM) framework, is proposed. Through MetaMind, each agent learns not only to predict and plan over its own beliefs, but also to inversely reason goals and beliefs from its own behavior trajectories. This self-reflective, bidirectional inference loop enables each agent to learn a metacognitive ability in a self-supervised manner. Then, MetaMind is shown to generalize the metacognitive ability from first-person to third-person through analogical reasoning. Thus, in multi-agent systems, each agent with MetaMind can actively reason about goals and beliefs of other agents from limited, observable behavior trajectories in a zero-shot manner, and then adapt to emergent collective intention without an explicit communication mechanism. Extended simulation results on diverse multi-agent tasks demonstrate that MetaMind can achieve superior task performance and outperform baselines in few-shot multi-agent generalization.

MetaMind: General and Cognitive World Models in Multi-Agent Systems by Meta-Theory of Mind

TL;DR

MetaMind, a general and cognitive world model for multi-agent systems that leverages a novel meta-theory of mind (Meta-ToM) framework, is proposed and can achieve superior task performance and outperform baselines in few-shot multi-agent generalization.

Abstract

A major challenge for world models in multi-agent systems is to understand interdependent agent dynamics, predict interactive multi-agent trajectories, and plan over long horizons with collective awareness, without centralized supervision or explicit communication. In this paper, MetaMind, a general and cognitive world model for multi-agent systems that leverages a novel meta-theory of mind (Meta-ToM) framework, is proposed. Through MetaMind, each agent learns not only to predict and plan over its own beliefs, but also to inversely reason goals and beliefs from its own behavior trajectories. This self-reflective, bidirectional inference loop enables each agent to learn a metacognitive ability in a self-supervised manner. Then, MetaMind is shown to generalize the metacognitive ability from first-person to third-person through analogical reasoning. Thus, in multi-agent systems, each agent with MetaMind can actively reason about goals and beliefs of other agents from limited, observable behavior trajectories in a zero-shot manner, and then adapt to emergent collective intention without an explicit communication mechanism. Extended simulation results on diverse multi-agent tasks demonstrate that MetaMind can achieve superior task performance and outperform baselines in few-shot multi-agent generalization.
Paper Structure (46 sections, 3 theorems, 58 equations, 6 figures, 3 tables, 3 algorithms)

This paper contains 46 sections, 3 theorems, 58 equations, 6 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Let observable behavior trajectory $\boldsymbol{a}^j_{0:H-1}$ be generated by $\pi^j$ and $\mathcal{D}^j$ under the true goal $\boldsymbol{g}^*$. Assume $\pi(\boldsymbol{b},\boldsymbol{g})$ is $\xi_\pi$-Lipschitz in $\boldsymbol{b}$ and $\mathcal{D}(\cdot,\boldsymbol{a}^j,\boldsymbol{g})$ is $\xi_{\ where $\varepsilon_t:= \|\Delta \mathcal{D}_t(\boldsymbol{b}_t^j)\|+\xi_{\mathcal{D}}\|\Delta \bold

Figures (6)

  • Figure 1: The proposed general and cognitive MetaMind in multi-agent systems.
  • Figure 2: Performance comparison on 8 SMAC tasks under limited environment steps. The win rates are averaged over 100 test episodes.
  • Figure 3: Performance comparison of win rate (Mean ± Std.) on 8 SMAC tasks versus different imagination horizon length.
  • Figure 4: Performance comparison of win rate in few-shot multi-agent generalization on 8 SMAC tasks under limited environment steps.
  • Figure 5: The scalability of a multi-map (MM) MetaMind to perform and cooperate over 13 SMAC maps, and the average horizon for goal identification of MT and single-map (SM) MetaMinds.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 1: Behavioral TD residual
  • Definition 2: Residual margin for discrete goal identification
  • Theorem 1: Goal identification
  • proof
  • Lemma 1: Local linear margin via Lipschitz Jacobian
  • proof
  • Theorem 2: Sufficient horizon size for goal identification
  • proof
  • Remark 1
  • proof
  • ...and 3 more