Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Donghwan Lee; Han-Dong Lim; Do Wan Kim

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Donghwan Lee, Han-Dong Lim, Do Wan Kim

TL;DR

The paper tackles the problem of planning in networked, cooperative multi-agent MDPs by developing continuous-time distributed dynamic programming algorithms that operate with only local rewards and neighbor communications. It introduces two DP schemes: a Wang–Elia-inspired version with a consensus-coupled auxiliary variable, and a decoupled version that separates value-function estimation from parameter mixing. The authors prove global asymptotic stability and characterize equilibrium points for both schemes using ODE and Lyapunov analyses, providing a solid control-theoretic foundation for distributed planning in multi-agent settings. These results lay the groundwork for extending to distributed temporal-difference learning and multi-agent Q-learning in reinforcement learning contexts, enabling scalable, privacy-preserving coordination in networked systems.

Abstract

The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

TL;DR

Abstract

Paper Structure (16 sections, 5 theorems, 45 equations, 7 figures, 2 algorithms)

This paper contains 16 sections, 5 theorems, 45 equations, 7 figures, 2 algorithms.

Introduction
Preliminaries
Notation and terminology
Graph theory
Markov decision process
Multi-agent MDP
Continuous-time distributed dynamic programming
Centralized dynamic programming
Distributed dynamic programming version 1
Distributed dynamic programming version 2
Conclusion
Appendix
Proof of Proposition \ref{['proposition:proposition-ver1-equil']}
Proof of Proposition \ref{['proposition:proposition-ver1-GAS']}
Proof of Proposition \ref{['proposition:3']}
...and 1 more sections

Key Result

Proposition 1

$\bar{\theta} ^*$ is a unique asymptotically stable equilibrium point of the linear system in eq:1, i.e., $\bar{\theta}_t \to \bar{\theta}^*$ as $t\to \infty$.

Figures (7)

Figure 1: Network topology of five RL agents.
Figure 2: \ref{['algo:1']}: Evolution of the first entries of $\theta_t^1$, $\theta_t^2$, $\theta_t^3$, $\theta_t^4$, and $\theta_t^5$.
Figure 3: \ref{['algo:1']}: Evolution of the second entries of $\theta_t^1$, $\theta_t^2$, $\theta_t^3$, $\theta_t^4$, and $\theta_t^5$.
Figure 4: \ref{['algo:1']}: Evolution of ${\left\| {{{\bar{\theta} }_t} - {{\bar{\theta} }^*}} \right\|_2}$.
Figure 5: \ref{['algo:2']}: Evolution of the first entries of $w_t^1$, $w_t^2$, $w_t^3$, $w_t^4$, and $w_t^5$.
...and 2 more figures

Theorems & Definitions (7)

Proposition 1
proof
Proposition 2: Equilibrium points
Proposition 3: Global asymptotic stability
Proposition 4: Equilibrium points
Proposition 5: Global asymptotic stability
Example 1

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

TL;DR

Abstract

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)