Neural Network Approaches for Parameterized Optimal Control

Deepanshu Verma; Nick Winovich; Lars Ruthotto; Bart van Bloemen Waanders

Neural Network Approaches for Parameterized Optimal Control

Deepanshu Verma, Nick Winovich, Lars Ruthotto, Bart van Bloemen Waanders

TL;DR

This work compares two training paradigms for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters and uses actor-critic reinforcement learning to approximate the policy in a data-driven way.

Abstract

We consider numerical approaches for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters. We seek to amortize the solution over a set of relevant parameters in an offline stage to enable rapid decision-making and be able to react to changes in the parameter in the online stage. To tackle the curse of dimensionality arising when the state and/or parameter are high-dimensional, we represent the policy using neural networks. We compare two training paradigms: First, our model-based approach leverages the dynamics and definition of the objective function to learn the value function of the parameterized optimal control problem and obtain the policy using a feedback form. Second, we use actor-critic reinforcement learning to approximate the policy in a data-driven way. Using an example involving a two-dimensional convection-diffusion equation, which features high-dimensional state and parameter spaces, we investigate the accuracy and efficiency of both training paradigms. While both paradigms lead to a reasonable approximation of the policy, the model-based approach is more accurate and considerably reduces the number of PDE solves.

Neural Network Approaches for Parameterized Optimal Control

TL;DR

Abstract

Paper Structure (19 sections, 1 theorem, 30 equations, 5 figures, 1 algorithm)

This paper contains 19 sections, 1 theorem, 30 equations, 5 figures, 1 algorithm.

Introduction
Parameterized Optimal Control Problem
Related Work
Model-Based Approach
Learning Problem
Function Value Approximation
Numerical Implementation
Data-Driven Approach
Actor-Critic Models
RL Network Architecture
Numerical Results
Advection-Diffusion Problem Formulation
Discretization
Feedback Form for HJB
Parallel Implementation of RL environments
...and 4 more sections

Key Result

Theorem 5.2

sutton2018reinforcement The gradient of the cumulative reward objective function eq:return can be expressed in terms of the policy gradients as follows:

Figures (5)

Figure 1: Network architectures for the actor (left) and critic (right) components of the RL models. Both networks receive input arrays input containing the values of $\boldsymbol{z}(s)$, $\boldsymbol{y}$, and $s$. Convolutional and max-pooling layers (blue) process the data received from the PDE environment to extract features. These features are then flattened and passed to dense layers (grey) to form the position, variance, and value predictions.
Figure 2: Example evolution of advection diffusion system
Figure 3: Horizontal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective.
Figure 4: Sinusoidal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective.
Figure 5: Suboptimality, relative to the baseline, on validation problems for the horizontal (left column) and sinusoidal (right column) problem setups.

Theorems & Definitions (2)

Remark 5.1
Theorem 5.2

Neural Network Approaches for Parameterized Optimal Control

TL;DR

Abstract

Neural Network Approaches for Parameterized Optimal Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)