Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes
Donghwan Lee, Han-Dong Lim, Do Wan Kim
TL;DR
The paper tackles the problem of planning in networked, cooperative multi-agent MDPs by developing continuous-time distributed dynamic programming algorithms that operate with only local rewards and neighbor communications. It introduces two DP schemes: a Wang–Elia-inspired version with a consensus-coupled auxiliary variable, and a decoupled version that separates value-function estimation from parameter mixing. The authors prove global asymptotic stability and characterize equilibrium points for both schemes using ODE and Lyapunov analyses, providing a solid control-theoretic foundation for distributed planning in multi-agent settings. These results lay the groundwork for extending to distributed temporal-difference learning and multi-agent Q-learning in reinforcement learning contexts, enabling scalable, privacy-preserving coordination in networked systems.
Abstract
The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.
