Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

Frederik Kelbel

Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

Frederik Kelbel

TL;DR

This paper proposes a drift relaxation-based sampling approach to alleviate the symptoms of high-variance policy approximations in the Deep Galerkin Method and shows improvements on the Linear-Quadratic Regulator problem over the Deep FBSDE approach.

Abstract

Ever since the concepts of dynamic programming were introduced, one of the most difficult challenges has been to adequately address high-dimensional control problems. With growing dimensionality, the utilisation of Deep Neural Networks promises to circumvent the issue of an otherwise exponentially increasing complexity. The paper specifically investigates the sampling issues the Deep Galerkin Method is subjected to. It proposes a drift relaxation-based sampling approach to alleviate the symptoms of high-variance policy approximations. This is validated on mean-field control problems; namely, the variations of the opinion dynamics presented by the Sznajd and the Hegselmann-Krause model. The resulting policies induce a significant cost reduction over manually optimised control functions and show improvements on the Linear-Quadratic Regulator problem over the Deep FBSDE approach.

Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

TL;DR

Abstract

Paper Structure (15 sections, 2 theorems, 24 equations, 8 figures, 3 algorithms)

This paper contains 15 sections, 2 theorems, 24 equations, 8 figures, 3 algorithms.

INTRODUCTION
BACKGROUND
Stochastic Optimal Control
The Hamilton-Jacobi-Bellman Equation
The Deep Galerkin Method
Agent-Based Dynamics
THE SAMPLING PROBLEM
NUMERICAL EVALUATION
Linear Quadratic Regulator
Sznajd Model
Hegselmann-Krause Model
CONCLUSIONS
Acknowledgements
Proof of Theorem \ref{['th:bound']}
Proof of Theorem \ref{['th:bound_2']}

Key Result

Theorem 1

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, then the $L^2$-error of the value function $J_\theta$ to the true value function $J$ at time $t$ is bounded by above by the residuals of the Hamilton-Jacobi-Bellman equation: Proof: See appendix.

Figures (8)

Figure 1: $\mathcal{E}(J_\theta; \mathcal{B}_{\pi_U})$ (Error) and $\mathcal{E}(J_\theta; \mathcal{B}_\mathbb{P})$ (Error P) determined w.r.t. a model trained on a uniform sampling scheme, i.e. samples were drawn from a uniform measure $\pi_U$ during training. Latter error was computed on a batch produced by the Euler-Mayurama scheme using the model's policy. The $\mathcal{E}(J_\theta; \mathcal{B}_\mathbb{P})$-error shows a high variance and poor convergence properties at a high magnitude. $\mathcal{E}(J_\theta; \mathcal{B}_{\pi_U})$ converges to a value of $0.87$, $\mathcal{E}(J_\theta; \mathcal{B}_\mathbb{P})$ oscillates between $3-4$ (Produced on Sznajd model as per Section \ref{['sec:evaluation']}).
Figure 2: $\mathcal{E}(J_\theta; \mathcal{B}_\mathbb{P})$ trained with a $\mathbb{P}$ sampling scheme as per Algorithm \ref{['alg:controlled_path']} (Produced on Sznajd model as per Section \ref{['sec:evaluation']}). The error converges to zero.
Figure 3: Controlled dynamics in accordance with the Linear Quadratic Regulator and $2$ agents. The interval is discretised with 100 time points. Using a sampler as in Algorithm \ref{['alg:controlled_path']} the Deep Galerkin Method performs better and at a lower complexity than an FBSDE-approach.
Figure 4: Uncontrolled opinion dynamics in accordance with the Sznajd model and $20$ agents. The interval $[0, 5]$ is discretised with $100$ time points. The problem is specified with $\beta=-3$, $\sigma=0.01$, $\gamma=0.04$, and $\lambda=1$. Each agent is associated with one coloured line. The bold blue line represents the empirical average.
Figure 5: Controlled opinion dynamics in accordance with the Sznajd model and $20$ agents. The interval $[0, 5]$ is discretised with $100$ time points. The problem is specified with $\beta=-3$, $\sigma=0.01$, $\gamma=0.04$, and $\lambda=1$. The lower graph show the associated cumulative cost in comparison to the policy $\upsilon^{(\alpha)}(x) = \alpha (x_d - x)$. The optimal value lies between $6$ and $8$, however, the DGM-policy performs better in every case. Each agent is associated with one coloured line. The bold blue line in the upper graph represents the empirical average.
...and 3 more figures

Theorems & Definitions (2)

Theorem 1
Theorem 2

Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

TL;DR

Abstract

Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)