Table of Contents
Fetching ...

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

TL;DR

This work addresses constrained multi-agent reinforcement learning (CMARL) in large homogeneous populations by leveraging constrained mean-field control (CMFC) as a scalable surrogate. It proves that solving a CMFC problem with a suitably adjusted constraint yields near-optimal CMARL policies, with an explicit finite-N error bound $e=O\left(\dfrac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{\sqrt{N}}\right)$ (improvable to $e=O\left(\dfrac{\sqrt{|\mathcal{X}|}}{\sqrt{N}}\right)$ under a special independence condition). The authors introduce a Natural Policy Gradient–based primal-dual algorithm to solve the CMFC problem and demonstrate that its solution approximates the CMARL optimum within $O(e)$, with sample complexity $O(e^{-6})$. Through a sandwiching proof, they show the CMFC-derived policy satisfies the CMARL constraint when deployed in the $N$-agent system and achieves near-optimal rewards as $N$ grows large. Experimental results on a network of competing firms illustrate the decay of the approximation error with $N$ and the alignment between $N$-agent and mean-field performance.

Abstract

Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

TL;DR

This work addresses constrained multi-agent reinforcement learning (CMARL) in large homogeneous populations by leveraging constrained mean-field control (CMFC) as a scalable surrogate. It proves that solving a CMFC problem with a suitably adjusted constraint yields near-optimal CMARL policies, with an explicit finite-N error bound (improvable to under a special independence condition). The authors introduce a Natural Policy Gradient–based primal-dual algorithm to solve the CMFC problem and demonstrate that its solution approximates the CMARL optimum within , with sample complexity . Through a sandwiching proof, they show the CMFC-derived policy satisfies the CMARL constraint when deployed in the -agent system and achieves near-optimal rewards as grows large. Experimental results on a network of competing firms illustrate the decay of the approximation error with and the alignment between -agent and mean-field performance.

Abstract

Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an -agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes , and respectively, can be approximated by an associated constrained MFC problem with an error, . In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to . Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of with a sample complexity of .
Paper Structure (31 sections, 20 theorems, 105 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 31 sections, 20 theorems, 105 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Lemma 1

Let $\boldsymbol{x}_0^N$ denote the initial joint state in an $N$-agent system, and $\boldsymbol{\mu}_0$ be its empirical distribution. If assumptions $assumption_1-assumption_2$ hold, then there exists a sufficiently large $N_0>0$ such that $\forall N\geq N_0$ the following inequality holds wheneve The terms $G_R, G_C$ are defined as shown below. where $C_P\triangleq 2+L_P$, $S_J\triangleq (M_J+

Figures (1)

  • Figure 1: Fig. (\ref{['subfig_1a']}) portrays the percentage error (defined by $(\ref{['eq_error']})$) as a function of $N$. On the other hand, Fig. (\ref{['subfig_1b']}) plots the $N$-agent (orange), and infinite-agent (blue) cost-values corresponding to the optimal mean-field policy as a function of $N$. It also shows that both of these values lie below the specified upper bound, $\zeta$ (green). The values of different system parameters are given as: $\alpha_R=1$, $\beta_R=0.5$, $\lambda_R=0.5$, $\lambda_C=1$, $\zeta=5$, $\gamma=0.9$, and $Q=10$. The hyperparameters used in Algorithm \ref{['algo_1']} are chosen as follows: $\eta_1=\eta_2=\alpha=10^{-3}$, $J=L=10^2$. The bold lines, and the half-width of the shaded regions respectively denote the mean values, and the standard deviation values obtained over $25$ random seeds. The experiments were performed on a $1.8$ GHz Dual-Core Intel $i5$ processor with $8$ GB $1600$ MHz DDR3 memory.

Theorems & Definitions (20)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Theorem 2
  • Theorem 3
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • ...and 10 more