Table of Contents
Fetching ...

Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions

Vidur Sinha, Muhammed Ustaomeroglu, Guannan Qu

TL;DR

The paper tackles scalable MARL for large networked systems by addressing long-range dependencies and cross-topology generalization. It introduces STACCA, a framework with a centralized Graph Transformer Critic to capture global dynamics and a shared Graph Transformer Actor to generalize across networks, augmented by a counterfactual advantage compatible with state-value critics. Key contributions include the graph-transformer actor/critic architectures, a three-step counterfactual credit-assignment method with timestep normalization, and extensive experiments on epidemic containment and rumor spreading demonstrating improved performance, generalization, and scalability. The results show that transformer-based MARL can yield scalable, transferable control policies for complex, large-scale networks with practical impact on infrastructure and information dynamics.

Abstract

Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leading to decay properties of local agent interactions, limiting their ability to capture long-range dependencies such as cascading power failures or epidemic outbreaks. Second, most approaches lack generalizability across network topologies, requiring retraining when applied to new graphs. We introduce STACCA (Shared Transformer Actor-Critic with Counterfactual Advantage), a unified transformer-based MARL framework that addresses both challenges. STACCA employs a centralized Graph Transformer Critic to model long-range dependencies and provide system-level feedback, while its shared Graph Transformer Actor learns a generalizable policy capable of adapting across diverse network structures. Further, to improve credit assignment during training, STACCA integrates a novel counterfactual advantage estimator that is compatible with state-value critic estimates. We evaluate STACCA on epidemic containment and rumor-spreading network control tasks, demonstrating improved performance, network generalization, and scalability. These results highlight the potential of transformer-based MARL architectures to achieve scalable and generalizable control in large-scale networked systems.

Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions

TL;DR

The paper tackles scalable MARL for large networked systems by addressing long-range dependencies and cross-topology generalization. It introduces STACCA, a framework with a centralized Graph Transformer Critic to capture global dynamics and a shared Graph Transformer Actor to generalize across networks, augmented by a counterfactual advantage compatible with state-value critics. Key contributions include the graph-transformer actor/critic architectures, a three-step counterfactual credit-assignment method with timestep normalization, and extensive experiments on epidemic containment and rumor spreading demonstrating improved performance, generalization, and scalability. The results show that transformer-based MARL can yield scalable, transferable control policies for complex, large-scale networks with practical impact on infrastructure and information dynamics.

Abstract

Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leading to decay properties of local agent interactions, limiting their ability to capture long-range dependencies such as cascading power failures or epidemic outbreaks. Second, most approaches lack generalizability across network topologies, requiring retraining when applied to new graphs. We introduce STACCA (Shared Transformer Actor-Critic with Counterfactual Advantage), a unified transformer-based MARL framework that addresses both challenges. STACCA employs a centralized Graph Transformer Critic to model long-range dependencies and provide system-level feedback, while its shared Graph Transformer Actor learns a generalizable policy capable of adapting across diverse network structures. Further, to improve credit assignment during training, STACCA integrates a novel counterfactual advantage estimator that is compatible with state-value critic estimates. We evaluate STACCA on epidemic containment and rumor-spreading network control tasks, demonstrating improved performance, network generalization, and scalability. These results highlight the potential of transformer-based MARL architectures to achieve scalable and generalizable control in large-scale networked systems.

Paper Structure

This paper contains 22 sections, 14 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Graph Transformer Critic (left) and Actor (right) Architectures. Node $i$ is represented by the blue node. Node $i$'s 1-hop neighborhood (green nodes) is its local observation at time $t$, $o_{i,t}$.
  • Figure 2: One-step trajectory branching for Node 2 in a 3 node epidemic containment environment.
  • Figure 3: Reward Shaping Visualization.
  • Figure 4: Ablation Experiments for STACCA: Comparing STACCA, to STACCA w/ MLP Actor, STACCA w/ MLP Critic, STACCA w/ GAT-Only Critic, and STACCA w/ no counterfactual advantage for the epidemic containment environment (top row) and rumor-spreading environment (bottom-row).
  • Figure 5: Epidemic Containment Environment: Comparison of MLP Policy and STACCA Graph-Transformer Policy on 100-node and 1000-node graphs of each of the 4 graph types. All examples use 25 infected seed nodes.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Example 1: Epidemic Containment
  • Example 2: Rumor Spreading