Table of Contents
Fetching ...

HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs

Nicolò Botteghi, Stefania Fresca, Mengwu Guo, Andrea Manzoni

TL;DR

This work tackles optimal control of parametric PDEs by introducing HypeRL, a parameter-informed DRL framework that combines TD3 with hypernetworks to condition policy and value networks on PDE parameters $\boldsymbol{\mu}$. The hypernetworks $h(\boldsymbol{\mu};\boldsymbol{\theta}_h)$ generate the main networks' weights, enabling accurate, generalizable control policies that interpolate or extrapolate across parameter variations. By avoiding explicit solution of the Hamilton–Jacobi–Bellman equation for every parameter instance, HypeRL achieves improved sample efficiency and generalization on challenging benchmarks, demonstrated on a 1D Kuramoto–Sivashinsky system with in-domain actuation and a 2D Navier–Stokes system with boundary control. The results indicate that parameter-informed encoding is key to robust, real-time PDE control in the presence of varying physics, with potential for broader application to PDE-constrained control problems.

Abstract

In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric partial differential equations (PDEs). Such problems frequently arise in applied sciences and engineering and entail a significant complexity when control and/or state variables are distributed in high-dimensional space or depend on varying parameters. Traditional numerical methods, relying on either iterative minimization algorithms or dynamic programming, while reliable, often become computationally infeasible. Indeed, in either way, the optimal control problem must be solved for each instance of the parameters, and this is out of reach when dealing with high-dimensional time-dependent and parametric PDEs. In this paper, we propose HypeRL, a deep reinforcement learning (DRL) framework to overcome the limitations shown by traditional methods. HypeRL aims at approximating the optimal control policy directly. Specifically, we employ an actor-critic DRL approach to learn an optimal feedback control strategy that can generalize across the range of variation of the parameters. To effectively learn such optimal control laws, encoding the parameter information into the DRL policy and value function neural networks (NNs) is essential. To do so, HypeRL uses two additional NNs, often called hypernetworks, to learn the weights and biases of the value function and the policy NNs. We validate the proposed approach on two PDE-constrained optimal control benchmarks, namely a 1D Kuramoto-Sivashinsky equation and a 2D Navier-Stokes equations, by showing that the knowledge of the PDE parameters and how this information is encoded, i.e., via a hypernetwork, is an essential ingredient for learning parameter-dependent control policies that can generalize effectively to unseen scenarios and for improving the sample efficiency of such policies.

HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs

TL;DR

This work tackles optimal control of parametric PDEs by introducing HypeRL, a parameter-informed DRL framework that combines TD3 with hypernetworks to condition policy and value networks on PDE parameters . The hypernetworks generate the main networks' weights, enabling accurate, generalizable control policies that interpolate or extrapolate across parameter variations. By avoiding explicit solution of the Hamilton–Jacobi–Bellman equation for every parameter instance, HypeRL achieves improved sample efficiency and generalization on challenging benchmarks, demonstrated on a 1D Kuramoto–Sivashinsky system with in-domain actuation and a 2D Navier–Stokes system with boundary control. The results indicate that parameter-informed encoding is key to robust, real-time PDE control in the presence of varying physics, with potential for broader application to PDE-constrained control problems.

Abstract

In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric partial differential equations (PDEs). Such problems frequently arise in applied sciences and engineering and entail a significant complexity when control and/or state variables are distributed in high-dimensional space or depend on varying parameters. Traditional numerical methods, relying on either iterative minimization algorithms or dynamic programming, while reliable, often become computationally infeasible. Indeed, in either way, the optimal control problem must be solved for each instance of the parameters, and this is out of reach when dealing with high-dimensional time-dependent and parametric PDEs. In this paper, we propose HypeRL, a deep reinforcement learning (DRL) framework to overcome the limitations shown by traditional methods. HypeRL aims at approximating the optimal control policy directly. Specifically, we employ an actor-critic DRL approach to learn an optimal feedback control strategy that can generalize across the range of variation of the parameters. To effectively learn such optimal control laws, encoding the parameter information into the DRL policy and value function neural networks (NNs) is essential. To do so, HypeRL uses two additional NNs, often called hypernetworks, to learn the weights and biases of the value function and the policy NNs. We validate the proposed approach on two PDE-constrained optimal control benchmarks, namely a 1D Kuramoto-Sivashinsky equation and a 2D Navier-Stokes equations, by showing that the knowledge of the PDE parameters and how this information is encoded, i.e., via a hypernetwork, is an essential ingredient for learning parameter-dependent control policies that can generalize effectively to unseen scenarios and for improving the sample efficiency of such policies.
Paper Structure (21 sections, 31 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 31 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: HypeRL for parametric PDE-constrained OC. We rely on a hypernetwork $h(\boldsymbol{\mu}; \boldsymbol{\theta}_{h_{\pi}})$ to learn, from the PDE parameters $\boldsymbol{\mu}$, the weights and biases of the policy (and value function) neural network.
  • Figure 2: Hyper policy and hyper value function architectures. In Figure \ref{['fig:hyperpolicymu']} and \ref{['fig:hypervaluemu']} only the PDE parameters are used as input to the hypernetwork, i.e., $\bm{z}_k=\boldsymbol{\mu}$, while in Figure \ref{['fig:hyperpolicy']} and \ref{['fig:hypervalue']} the PDE state and parameters are used as input to the hypernetwork, i.e., $\bm{z}_k=[\bm{y}_k, \boldsymbol{\mu}]$.
  • Figure 3: Training and evaluation results. The solid line represents the mean and the shaded area the standard deviation over 5 different random seeds.
  • Figure 4: $95\%$ confidence intervals of the training and evaluation reward for the different phases of training and evaluation. This metric, suggested in agarwal2021deep, allows to assess the reliability of the results accounting for the stochastic nature of the RL experiments.
  • Figure 5: Controlled solutions for $\mu=-0.062$ and $\mu=0.25$.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Remark 1