Table of Contents
Fetching ...

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Mi Zhou, Jiazhi Li, Masood Mortazavi, Ning Yan, Chaouki Abdallah

TL;DR

The performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Abstract

In this article, a \underline{S}tate-dependent \underline{M}ulti-\underline{A}gent \underline{D}eep \underline{D}eterministic \underline{P}olicy \underline{G}radient (\textbf{SMADDPG}) method is proposed in order to learn an optimal control policy for regionally switched systems. We observe good performance of this method and explain it in a rigorous mathematical language using some simplifying assumptions in order to motivate the ideas and to apply them to some canonical examples. Using reinforcement learning, the performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

TL;DR

The performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Abstract

In this article, a \underline{S}tate-dependent \underline{M}ulti-\underline{A}gent \underline{D}eep \underline{D}eterministic \underline{P}olicy \underline{G}radient (\textbf{SMADDPG}) method is proposed in order to learn an optimal control policy for regionally switched systems. We observe good performance of this method and explain it in a rigorous mathematical language using some simplifying assumptions in order to motivate the ideas and to apply them to some canonical examples. Using reinforcement learning, the performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.
Paper Structure (11 sections, 4 theorems, 19 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 4 theorems, 19 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

The value function for above defined optimal control problem for a hybrid system is continuous but may not be differentiable at the switching interface with switching state $x(\tau)$ and switching time instant $\tau$.

Figures (4)

  • Figure 1: Multiple region dynamical systems: (a) systems illustration; (b) system transition graph.
  • Figure 2: (a) Optimal control of Example 1 (switching time $\tau=0.4694$ and optimal cost $J=1.0209$); (b) Example 2 (switching time $\tau=0.3132$ and optimal cost $J=6.5274$).
  • Figure 3: FO: (a) Episode reward (fixed seed); (b) Learned control policy (fixed seed); (c) Average episodic reward under 10 runs with randomly generated seed.
  • Figure 4: MR: (a) Episode reward (fixed seed); (b) Learned control policy (fixed seed); (c) Average episodic reward under 10 runs with randomly generated seed.

Theorems & Definitions (6)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • Theorem 3.3: Approximation theory of deep neural network NNerror(Theorem 4.16)
  • Theorem 3.4
  • proof