Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning
Yuchen Shi, Shihong Duan, Cheng Xu, Ran Wang, Fangwen Ye, Chau Yuen
TL;DR
The paper tackles relative overgeneralization in multi-agent reinforcement learning by introducing Dynamic Deep Factor Graphs (DDFG), which decompose the global value function using on-the-fly factor graphs. It combines a CP-based Q-value network with a graph-structure policy implemented via a hypernetwork, and performs inference with the max-plus algorithm to coordinate agents dynamically. Key contributions include a CP tensorization for high-order local value functions, a quasi-multinomial graph policy for real-time structure generation, and a PPO-style training objective with entropy regularization. Empirical results on HO-Predator-Prey and SMAC demonstrate improved coordination, robustness to penalties and dynamic collaboration, and superior performance to state-of-the-art baselines, highlighting the practical impact for scalable, adaptive MARL in complex environments.
Abstract
This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates factor graph structures on-the-fly, effectively addressing the dynamic collaboration requirements among agents. DDFG strikes an optimal balance between the computational overhead associated with aggregating value functions and the performance degradation inherent in their complete decomposition. Through the application of the max-sum algorithm, DDFG efficiently identifies optimal policies. We empirically validate DDFG's efficacy in complex scenarios, including higher-order predator-prey tasks and the StarCraft II Multi-agent Challenge (SMAC), thus underscoring its capability to surmount the limitations faced by existing value decomposition algorithms. DDFG emerges as a robust solution for MARL challenges that demand nuanced understanding and facilitation of dynamic agent collaboration. The implementation of DDFG is made publicly accessible, with the source code available at \url{https://github.com/SICC-Group/DDFG}.
