Table of Contents
Fetching ...

Reducing Redundant Computation in Multi-Agent Coordination through Locally Centralized Execution

Yidong Bai, Toshiharu Sugawara

TL;DR

This work tackles redundant computation in multi-agent coordination by introducing Locally Centralized Execution (LCE) and the locally centralized team transformer (LCTT) within a CTLCE framework. It defines a redundant observation metric and builds three components: Team Transformer for targeted leader to worker messages, and Leadership Shift to dynamically reallocate leadership. In Level-Based Foraging experiments, LCTT reduces computation while delivering comparable rewards and faster convergence, with two leaders often yielding the best trade-off between guidance and autonomy. The approach offers a scalable middle ground between centralized and fully decentralized schemes and opens avenues for deeper integration with inter-agent communication strategies.

Abstract

In multi-agent reinforcement learning, decentralized execution is a common approach, yet it suffers from the redundant computation problem. This occurs when multiple agents redundantly perform the same or similar computation due to overlapping observations. To address this issue, this study introduces a novel method referred to as locally centralized team transformer (LCTT). LCTT establishes a locally centralized execution framework where selected agents serve as leaders, issuing instructions, while the rest agents, designated as workers, act as these instructions without activating their policy networks. For LCTT, we proposed the team-transformer (T-Trans) architecture that allows leaders to provide specific instructions to each worker, and the leadership shift mechanism that allows agents autonomously decide their roles as leaders or workers. Our experimental results demonstrate that the proposed method effectively reduces redundant computation, does not decrease reward levels, and leads to faster learning convergence.

Reducing Redundant Computation in Multi-Agent Coordination through Locally Centralized Execution

TL;DR

This work tackles redundant computation in multi-agent coordination by introducing Locally Centralized Execution (LCE) and the locally centralized team transformer (LCTT) within a CTLCE framework. It defines a redundant observation metric and builds three components: Team Transformer for targeted leader to worker messages, and Leadership Shift to dynamically reallocate leadership. In Level-Based Foraging experiments, LCTT reduces computation while delivering comparable rewards and faster convergence, with two leaders often yielding the best trade-off between guidance and autonomy. The approach offers a scalable middle ground between centralized and fully decentralized schemes and opens avenues for deeper integration with inter-agent communication strategies.

Abstract

In multi-agent reinforcement learning, decentralized execution is a common approach, yet it suffers from the redundant computation problem. This occurs when multiple agents redundantly perform the same or similar computation due to overlapping observations. To address this issue, this study introduces a novel method referred to as locally centralized team transformer (LCTT). LCTT establishes a locally centralized execution framework where selected agents serve as leaders, issuing instructions, while the rest agents, designated as workers, act as these instructions without activating their policy networks. For LCTT, we proposed the team-transformer (T-Trans) architecture that allows leaders to provide specific instructions to each worker, and the leadership shift mechanism that allows agents autonomously decide their roles as leaders or workers. Our experimental results demonstrate that the proposed method effectively reduces redundant computation, does not decrease reward levels, and leads to faster learning convergence.
Paper Structure (13 sections, 8 equations, 5 figures, 2 algorithms)

This paper contains 13 sections, 8 equations, 5 figures, 2 algorithms.

Figures (5)

  • Figure 1: Example Environments of MADRL.
  • Figure 2: Example of a level-based foraging environment.
  • Figure 3: Example of a CTLCE framework structure, in which $Q_{tot}$ is the global Q-values, $s$ is the global state, and $\theta$ is the policy network.
  • Figure 4: T-Trans Structure. Parameters $d_{att}$ and $d_{gru}$ are dimensions of features of attention layers and GRU, and $d$ is the dimension of observation of respective entities; $\bm{k_i}$ and $\bm{v_i}$ are key and value matrices whose sizes are $(|E|+n) \times d_{att}$. $\bm{q_i}$ is the $n \times d_{att}$ query matrix.
  • Figure 5: Experimental results of LCTT and baselines.