Reinforcement learning-based estimation for partial differential equations
Saviz Mowlavi, Mouhacine Benosman
TL;DR
This work addresses the challenge of state estimation for high-dimensional, parametric PDEs from sparse measurements. It introduces the RL-ROE, a reduced-order estimator that replaces the Kalman gain with a nonlinear policy learned by reinforcement learning, built on a Dynamic Mode Decomposition ROM ${\bm{z}}_k \approx {\bm{U}}{\bm{x}}_k$ with ${\bm{x}}_k = {\bm{A}}_r{\bm{x}}_{k-1} + {\bm{w}}_{k-1}$ and ${\bm{y}}_k = {\bm{C}}_r{\bm{x}}_k + {\bm{v}}_k$. The policy ${\boldsymbol{\pi}}_{\boldsymbol{\theta}}$ produces corrective actions ${\bm{a}}_k$ from measurements ${\bm{y}}_k$ and prior estimates, and is trained offline via PPO with domain randomization over the parameter $\mu$ to generalize across regimes. Results on Burgers and Navier-Stokes demonstrate that RL-ROE substantially outperforms a Kalman-based ROE when sensors are few, achieving near-lower-bound accuracy and robust performance across unseen parameter values and noise. This approach enables accurate, real-time high-dimensional state estimation for parametric PDEs without needing ROM closure terms or full parameter knowledge, making it attractive for embedded control with sparse sensing.
Abstract
In systems governed by nonlinear partial differential equations such as fluid flows, the design of state estimators such as Kalman filters relies on a reduced-order model (ROM) that projects the original high-dimensional dynamics onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROM-based estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM. Moreover, it yields accurate high-dimensional state estimates for trajectories corresponding to various physical parameter values, without direct knowledge of the latter.
