Table of Contents
Fetching ...

Neural Network Approaches for Parameterized Optimal Control

Deepanshu Verma, Nick Winovich, Lars Ruthotto, Bart van Bloemen Waanders

TL;DR

This work compares two training paradigms for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters and uses actor-critic reinforcement learning to approximate the policy in a data-driven way.

Abstract

We consider numerical approaches for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters. We seek to amortize the solution over a set of relevant parameters in an offline stage to enable rapid decision-making and be able to react to changes in the parameter in the online stage. To tackle the curse of dimensionality arising when the state and/or parameter are high-dimensional, we represent the policy using neural networks. We compare two training paradigms: First, our model-based approach leverages the dynamics and definition of the objective function to learn the value function of the parameterized optimal control problem and obtain the policy using a feedback form. Second, we use actor-critic reinforcement learning to approximate the policy in a data-driven way. Using an example involving a two-dimensional convection-diffusion equation, which features high-dimensional state and parameter spaces, we investigate the accuracy and efficiency of both training paradigms. While both paradigms lead to a reasonable approximation of the policy, the model-based approach is more accurate and considerably reduces the number of PDE solves.

Neural Network Approaches for Parameterized Optimal Control

TL;DR

This work compares two training paradigms for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters and uses actor-critic reinforcement learning to approximate the policy in a data-driven way.

Abstract

We consider numerical approaches for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters. We seek to amortize the solution over a set of relevant parameters in an offline stage to enable rapid decision-making and be able to react to changes in the parameter in the online stage. To tackle the curse of dimensionality arising when the state and/or parameter are high-dimensional, we represent the policy using neural networks. We compare two training paradigms: First, our model-based approach leverages the dynamics and definition of the objective function to learn the value function of the parameterized optimal control problem and obtain the policy using a feedback form. Second, we use actor-critic reinforcement learning to approximate the policy in a data-driven way. Using an example involving a two-dimensional convection-diffusion equation, which features high-dimensional state and parameter spaces, we investigate the accuracy and efficiency of both training paradigms. While both paradigms lead to a reasonable approximation of the policy, the model-based approach is more accurate and considerably reduces the number of PDE solves.
Paper Structure (19 sections, 1 theorem, 30 equations, 5 figures, 1 algorithm)

This paper contains 19 sections, 1 theorem, 30 equations, 5 figures, 1 algorithm.

Key Result

Theorem 5.2

sutton2018reinforcement The gradient of the cumulative reward objective function eq:return can be expressed in terms of the policy gradients as follows:

Figures (5)

  • Figure 1: Network architectures for the actor (left) and critic (right) components of the RL models. Both networks receive input arrays input containing the values of $\boldsymbol{z}(s)$, $\boldsymbol{y}$, and $s$. Convolutional and max-pooling layers (blue) process the data received from the PDE environment to extract features. These features are then flattened and passed to dense layers (grey) to form the position, variance, and value predictions.
  • Figure 2: Example evolution of advection diffusion system
  • Figure 3: Horizontal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective.
  • Figure 4: Sinusoidal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective.
  • Figure 5: Suboptimality, relative to the baseline, on validation problems for the horizontal (left column) and sinusoidal (right column) problem setups.

Theorems & Definitions (2)

  • Remark 5.1
  • Theorem 5.2