Table of Contents
Fetching ...

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

Yuchen Shi, Shihong Duan, Cheng Xu, Ran Wang, Fangwen Ye, Chau Yuen

TL;DR

The paper tackles relative overgeneralization in multi-agent reinforcement learning by introducing Dynamic Deep Factor Graphs (DDFG), which decompose the global value function using on-the-fly factor graphs. It combines a CP-based Q-value network with a graph-structure policy implemented via a hypernetwork, and performs inference with the max-plus algorithm to coordinate agents dynamically. Key contributions include a CP tensorization for high-order local value functions, a quasi-multinomial graph policy for real-time structure generation, and a PPO-style training objective with entropy regularization. Empirical results on HO-Predator-Prey and SMAC demonstrate improved coordination, robustness to penalties and dynamic collaboration, and superior performance to state-of-the-art baselines, highlighting the practical impact for scalable, adaptive MARL in complex environments.

Abstract

This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates factor graph structures on-the-fly, effectively addressing the dynamic collaboration requirements among agents. DDFG strikes an optimal balance between the computational overhead associated with aggregating value functions and the performance degradation inherent in their complete decomposition. Through the application of the max-sum algorithm, DDFG efficiently identifies optimal policies. We empirically validate DDFG's efficacy in complex scenarios, including higher-order predator-prey tasks and the StarCraft II Multi-agent Challenge (SMAC), thus underscoring its capability to surmount the limitations faced by existing value decomposition algorithms. DDFG emerges as a robust solution for MARL challenges that demand nuanced understanding and facilitation of dynamic agent collaboration. The implementation of DDFG is made publicly accessible, with the source code available at \url{https://github.com/SICC-Group/DDFG}.

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

TL;DR

The paper tackles relative overgeneralization in multi-agent reinforcement learning by introducing Dynamic Deep Factor Graphs (DDFG), which decompose the global value function using on-the-fly factor graphs. It combines a CP-based Q-value network with a graph-structure policy implemented via a hypernetwork, and performs inference with the max-plus algorithm to coordinate agents dynamically. Key contributions include a CP tensorization for high-order local value functions, a quasi-multinomial graph policy for real-time structure generation, and a PPO-style training objective with entropy regularization. Empirical results on HO-Predator-Prey and SMAC demonstrate improved coordination, robustness to penalties and dynamic collaboration, and superior performance to state-of-the-art baselines, highlighting the practical impact for scalable, adaptive MARL in complex environments.

Abstract

This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates factor graph structures on-the-fly, effectively addressing the dynamic collaboration requirements among agents. DDFG strikes an optimal balance between the computational overhead associated with aggregating value functions and the performance degradation inherent in their complete decomposition. Through the application of the max-sum algorithm, DDFG efficiently identifies optimal policies. We empirically validate DDFG's efficacy in complex scenarios, including higher-order predator-prey tasks and the StarCraft II Multi-agent Challenge (SMAC), thus underscoring its capability to surmount the limitations faced by existing value decomposition algorithms. DDFG emerges as a robust solution for MARL challenges that demand nuanced understanding and facilitation of dynamic agent collaboration. The implementation of DDFG is made publicly accessible, with the source code available at \url{https://github.com/SICC-Group/DDFG}.
Paper Structure (16 sections, 3 theorems, 17 equations, 9 figures)

This paper contains 16 sections, 3 theorems, 17 equations, 9 figures.

Key Result

Proposition 1

In the graph policy $\rho ({{A}_{t}}\mid{\bm{\tau }_{t}})$, for each local value function ${{Q}_{j}}$, when The number of times it is connected to N agents corresponds to a random variable $X\sim {{P}_{{{D}_{\max }}}}({{D}_{\max }}:{{ p}_{1}},{{p}_{2}},\ldots ,{{p}_{N}})$, and get ${{D}_{\max }}$ as

Figures (9)

  • Figure 1: Visualization of the factor graph $Q(\bm{\tau} ,\bm{u})=\sum\limits_{j\in \mathcal{J}}{{{Q}_{j}}({\bm{u}^{j}}\mid\bm{\tau} )}$.
  • Figure 2: The algorithmic framework of DDFG. (a) The network structure of Q-value function (Section 3.1). (b) The overall architecture of DDFG (Section 3). (c) The network structure of graph structure generation policy (Section 3.2).
  • Figure 3: HO-Predator-Prey environment. Predators are marked in blue, prey is marked in red, and the blue grid represents the range of movement of the predator.
  • Figure 4: Median test return for the Higher-order Predator-Prey task with different penalties p(0,-0.5,-1,-1.5), comparing DDFG and baselines.
  • Figure 5: Median test return for the Higher-order Predator-Prey task with ${r}_{t}$, comparing DDFG and baselines.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • Proposition 3