Table of Contents
Fetching ...

Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range

Yuzi Yan, Yuan Shen

TL;DR

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems and demonstrates how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off.

Abstract

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication constraints, implying that each agent can only exchange information with a limited number of neighboring agents. On the underlying graph of the network, each agent implements its control input depending on its nearby neighbors' states in the linear quadratic control setting. We show that it is possible to approximate the exact gradient only using local information. Compared with the centralized optimal controller, the performance gap decreases to zero exponentially as the communication and control ranges increase. We also demonstrate how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off. The simulation results verify our theoretical findings.

Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range

TL;DR

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems and demonstrates how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off.

Abstract

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication constraints, implying that each agent can only exchange information with a limited number of neighboring agents. On the underlying graph of the network, each agent implements its control input depending on its nearby neighbors' states in the linear quadratic control setting. We show that it is possible to approximate the exact gradient only using local information. Compared with the centralized optimal controller, the performance gap decreases to zero exponentially as the communication and control ranges increase. We also demonstrate how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off. The simulation results verify our theoretical findings.
Paper Structure (23 sections, 15 theorems, 59 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 15 theorems, 59 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

yang2019provably [Policy Gradient Theorem] In LQR, it can be equivalently expressed as

Figures (6)

  • Figure 1: An illustrative diagram for a networked system. The top figure illustrates the local control inputs, local controllers, and local observations; and the bottom figure provides a global perspective where the optimization objective controller $\mathbf{K} \in \mathcal{M}^r$. Note that we omit the noise added to the control output for simplicity.
  • Figure 2: A diagram to show the spatial structures of the system matrices, including $\mathbf{A}$, $\mathbf{B}$, $\mathbf{Q}$ and $\mathbf{R}$.
  • Figure 3: The black oval represents the sub-level set $S_{C(\mathbf{K})}$. The red line and the green line represent the one-step move along the direction of $\mathcal{P}_{\mathcal{M}^r}(\nabla_{\mathbf{K}} C(\mathbf{K}))$ and $\widehat{\mathbf{h}}(\mathbf{K}) = \sum_{i=1}^n \widehat{\mathbf{h}}_i(\mathbf{K})$, respectively. The blue line represents the difference caused by the gradient approximation and thus depends on $\kappa$. If the blue circle is so large that $C(\mathbf{K}")$ moves out of $S_{C(\mathbf{K})}$ to the orange area, i.e., the $\kappa$ is so small such that the approximation is too inaccurate, the system may take the risk of being unstable.
  • Figure 4: The diagram of the four representative graphs: line, circle, 2-ary tree and 4-regular grid.
  • Figure 5: Relative performance gap compared to the optimal controller $\mathbf{K}^*$ with different communication range limit $\kappa$ and different control range $r$. In semi-logarithmic plots with a logarithmic y-axis, the relative cost error curve is linear, aligning with theoretical results, and confirming the main conclusion that the performance gap between $\mathbf{K}(T)$ and $\mathbf{K}^*$ decreases exponentially as $r$ and $\kappa$ increase.
  • ...and 1 more figures

Theorems & Definitions (27)

  • Definition 1
  • Lemma 1
  • Remark 1
  • Definition 2: Exponential Decay Property
  • Lemma 2
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Corollary 3
  • Corollary 4
  • ...and 17 more