Table of Contents
Fetching ...

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Wei Duan, Jie Lu, Junyu Xuan

TL;DR

This paper tackles the challenge of coordinating multiple agents in cooperative MARL by learning latent temporal sparse graphs from observation trajectories, addressing limitations of one-step graph methods and expensive action-pair computations. The proposed Latent Temporal Sparse Coordination Graph (LTS-CG) uses an encoder to produce a sparse agent-pair graph from trajectories and two regularizers (Predict-Future and Infer-Present) to shape meaningful graphs, enabling end-to-end training with policy optimization (QMIX). Empirical results on StarCraft II demonstrate faster convergence, reduced variance, and strong scalability relative to graph-based and non-graph baselines; ablations confirm the benefits of trajectory-based graph learning and the two graph-learning characteristics. Overall, LTS-CG provides a scalable, interpretable, and effective approach to graph-based coordination in MARL with practical impact for complex multi-agent tasks.

Abstract

Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

TL;DR

This paper tackles the challenge of coordinating multiple agents in cooperative MARL by learning latent temporal sparse graphs from observation trajectories, addressing limitations of one-step graph methods and expensive action-pair computations. The proposed Latent Temporal Sparse Coordination Graph (LTS-CG) uses an encoder to produce a sparse agent-pair graph from trajectories and two regularizers (Predict-Future and Infer-Present) to shape meaningful graphs, enabling end-to-end training with policy optimization (QMIX). Empirical results on StarCraft II demonstrate faster convergence, reduced variance, and strong scalability relative to graph-based and non-graph baselines; ablations confirm the benefits of trajectory-based graph learning and the two graph-learning characteristics. Overall, LTS-CG provides a scalable, interpretable, and effective approach to graph-based coordination in MARL with practical impact for complex multi-agent tasks.

Abstract

Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.
Paper Structure (27 sections, 17 equations, 12 figures, 4 tables)

This paper contains 27 sections, 17 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: The current methods to infer latent graphs in MARL can be categorized into three types: (a) fully connected unweighted graphs, (b) fully connected weighted graphs, and (c) sparse weighted graphs. These methods rely solely on one-step observations, leading to deficient graphs that foster redundant or detrimental information exchanges and suffer from high computational complexity for action-pair calculations.
  • Figure 2: The framework of LTS-CG. LTS-CG consists of two key modules: Inter-Agent Sparse Graph Learning and Cooperative MARL. The former follows an encoder-decoder framework: the encoder generates the sparse graph structure, while the decoder—guided by two graph loss functions—learns Predict-Future for anticipating future steps and Infer-Present for deducing current states. The temporal graph structure integrates past experiences and adjusts edge weights based on current observations. This graph is then fed into the attention-based graph convolution of the Cooperative MARL module, enabling knowledge exchange for effective coordination. Graph learning and agent training occur end-to-end.
  • Figure 3: Performance of our method and baselines on six maps of the StarCraft II benchmark DBLP:conf/atal/SamvelyanRWFNRH19. The Y-axis is the test winning rate of the game. The X-axis is the training steps.
  • Figure 4: Performance comparison on the 25m and 27m_vs_30m maps. Due to the high computational complexity, SOP-CG and DCG could not complete 2 million steps within a week, and CASEC exceeded the 48 GB GPU memory limit.
  • Figure 5: Performance comparison of non-graph-based methods on 3s5z and 8m_vs_9m.
  • ...and 7 more figures