Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Mi Zhou; Jiazhi Li; Masood Mortazavi; Ning Yan; Chaouki Abdallah

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Mi Zhou, Jiazhi Li, Masood Mortazavi, Ning Yan, Chaouki Abdallah

TL;DR

The performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Abstract

In this article, a \underline{S}tate-dependent \underline{M}ulti-\underline{A}gent \underline{D}eep \underline{D}eterministic \underline{P}olicy \underline{G}radient (\textbf{SMADDPG}) method is proposed in order to learn an optimal control policy for regionally switched systems. We observe good performance of this method and explain it in a rigorous mathematical language using some simplifying assumptions in order to motivate the ideas and to apply them to some canonical examples. Using reinforcement learning, the performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

TL;DR

The performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.

Abstract

Paper Structure (11 sections, 4 theorems, 19 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 4 theorems, 19 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Problem formulated
Theoretical error analysis
Hamilton-Jacobi-Bellman equation for hybrid system
Approximate dynamic programming and finite horizon RL
Approximation theory
State-Based Multi-agent Deep deterministic policy gradient (SMADDPG)
Illustrative examples
Example 1: FO
Example 2: MR
Conclusion

Key Result

Theorem 3.1

The value function for above defined optimal control problem for a hybrid system is continuous but may not be differentiable at the switching interface with switching state $x(\tau)$ and switching time instant $\tau$.

Figures (4)

Figure 1: Multiple region dynamical systems: (a) systems illustration; (b) system transition graph.
Figure 2: (a) Optimal control of Example 1 (switching time $\tau=0.4694$ and optimal cost $J=1.0209$); (b) Example 2 (switching time $\tau=0.3132$ and optimal cost $J=6.5274$).
Figure 3: FO: (a) Episode reward (fixed seed); (b) Learned control policy (fixed seed); (c) Average episodic reward under 10 runs with randomly generated seed.
Figure 4: MR: (a) Episode reward (fixed seed); (b) Learned control policy (fixed seed); (c) Average episodic reward under 10 runs with randomly generated seed.

Theorems & Definitions (6)

Theorem 3.1
proof
Theorem 3.2
Theorem 3.3: Approximation theory of deep neural network NNerror(Theorem 4.16)
Theorem 3.4
proof

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

TL;DR

Abstract

Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)