Reinforcement learning-based estimation for partial differential equations

Saviz Mowlavi; Mouhacine Benosman

Reinforcement learning-based estimation for partial differential equations

Saviz Mowlavi, Mouhacine Benosman

TL;DR

This work addresses the challenge of state estimation for high-dimensional, parametric PDEs from sparse measurements. It introduces the RL-ROE, a reduced-order estimator that replaces the Kalman gain with a nonlinear policy learned by reinforcement learning, built on a Dynamic Mode Decomposition ROM ${\bm{z}}_k \approx {\bm{U}}{\bm{x}}_k$ with ${\bm{x}}_k = {\bm{A}}_r{\bm{x}}_{k-1} + {\bm{w}}_{k-1}$ and ${\bm{y}}_k = {\bm{C}}_r{\bm{x}}_k + {\bm{v}}_k$. The policy ${\boldsymbol{\pi}}_{\boldsymbol{\theta}}$ produces corrective actions ${\bm{a}}_k$ from measurements ${\bm{y}}_k$ and prior estimates, and is trained offline via PPO with domain randomization over the parameter $\mu$ to generalize across regimes. Results on Burgers and Navier-Stokes demonstrate that RL-ROE substantially outperforms a Kalman-based ROE when sensors are few, achieving near-lower-bound accuracy and robust performance across unseen parameter values and noise. This approach enables accurate, real-time high-dimensional state estimation for parametric PDEs without needing ROM closure terms or full parameter knowledge, making it attractive for embedded control with sparse sensing.

Abstract

In systems governed by nonlinear partial differential equations such as fluid flows, the design of state estimators such as Kalman filters relies on a reduced-order model (ROM) that projects the original high-dimensional dynamics onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROM-based estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM. Moreover, it yields accurate high-dimensional state estimates for trajectories corresponding to various physical parameter values, without direct knowledge of the latter.

Reinforcement learning-based estimation for partial differential equations

TL;DR

with

and

. The policy

produces corrective actions

from measurements

and prior estimates, and is trained offline via PPO with domain randomization over the parameter

to generalize across regimes. Results on Burgers and Navier-Stokes demonstrate that RL-ROE substantially outperforms a Kalman-based ROE when sensors are few, achieving near-lower-bound accuracy and robust performance across unseen parameter values and noise. This approach enables accurate, real-time high-dimensional state estimation for parametric PDEs without needing ROM closure terms or full parameter knowledge, making it attractive for embedded control with sparse sensing.

Abstract

Paper Structure (24 sections, 32 equations, 15 figures)

This paper contains 24 sections, 32 equations, 15 figures.

Introduction
General methodology
Problem formulation
Reduced-order model
Reinforcement learning-based reduced-order estimator
Summary of the proposed methodology
Offline training methodology
Results
Burgers equation
Navier-Stokes equations
Related work
Conclusions
Dynamic Mode Decomposition
Bayesian interpretation
Bayesian optimal filter
...and 9 more sections

Figures (15)

Figure 1: Overview of the proposed RL-ROE methodology
Figure 2: Burgers equation with $p = 4$ sensors. Normalized $L_2$ error of the RL-ROE and KF-ROE for the estimation of trajectories corresponding to values of $\mu$ not seen during training.
Figure 3: Burgers equation with $p = 4$ sensors. Ground-truth (reference) trajectories for values of $\mu$ not seen during training and corresponding RL-ROE and KF-ROE estimates. The dashed lines on the reference trajectory plots indicate the sensor data seen by the RL-ROE and KF-ROE.
Figure 4: Burgers equation. Left: Average over time of the normalized $L_2$ error versus $\mu$ when using $p = 4$ sensors. Values of $\mu$ present in ${\bm{Z}}_\mathrm{train}$ shown by large circles. Right: Average over time and over $\mu$ of the normalized $L_2$ error versus number of sensors $p$.
Figure 5: Navier-Stokes equations with $p = 3$ sensors. Normalized $L_2$ error of the RL-ROE and KF-ROE for the estimation of trajectories corresponding to values of $Re$ not seen during training.
...and 10 more figures

Reinforcement learning-based estimation for partial differential equations

TL;DR

Abstract

Reinforcement learning-based estimation for partial differential equations

Authors

TL;DR

Abstract

Table of Contents

Figures (15)