Table of Contents
Fetching ...

Data-assimilated model-informed reinforcement learning

Defne E. Ozan, Andrea Nóvoa, Georgios Rigas, Luca Magri

TL;DR

This work tackles controlling spatio-temporally chaotic systems under partial and noisy observability by proposing DA-MIRL, a framework that couples a predictive environment model, ensemble Kalman filter state estimation, and an off-policy actor-critic RL agent. By integrating a physics-based (truncated Fourier) or a data-driven (control-aware ESN) model with real-time data assimilation, the approach converts a POMDP into a practically tractable MDP for learning. The KS equation serves as a challenging testbed, where DA-MIRL achieves robust stabilization with significantly fewer sensors than model-free RL and maintains performance across varying chaotic regimes. The results highlight the practical impact of modular observer-controller design, enabling real-time control of complex chaotic dynamics in scenarios with limited sensing and noisy data. Overall, the framework broadens the applicability of RL-based control to partially observable, high-dimensional chaotic systems, with scalable, open-source implementations.

Abstract

The control of spatio-temporally chaos is challenging because of high dimensionality and unpredictability. Model-free reinforcement learning (RL) discovers optimal control policies by interacting with the system, typically requiring observations of the full physical state. In practice, sensors often provide only partial and noisy measurements (observations) of the system. The objective of this paper is to develop a framework that enables the control of chaotic systems with partial and noisy observability. The proposed method, data-assimilated model-informed reinforcement learning (DA-MIRL), integrates (i) low-order models to approximate high-dimensional dynamics; (ii) sequential data assimilation to correct the model prediction when observations become available; and (iii) an off-policy actor-critic RL algorithm to adaptively learn an optimal control strategy based on the corrected state estimates. We test DA-MIRL on the spatiotemporally chaotic solutions of the Kuramoto-Sivashinsky equation. We estimate the full state of the environment with (i) a physics-based model, here, a coarse-grained model; and (ii) a data-driven model, here, the control-aware echo state network, which is proposed in this paper. We show that DA-MIRL successfully estimates and suppresses the chaotic dynamics of the environment in real time from partial observations and approximate models. This work opens opportunities for the control of partially observable chaotic systems.

Data-assimilated model-informed reinforcement learning

TL;DR

This work tackles controlling spatio-temporally chaotic systems under partial and noisy observability by proposing DA-MIRL, a framework that couples a predictive environment model, ensemble Kalman filter state estimation, and an off-policy actor-critic RL agent. By integrating a physics-based (truncated Fourier) or a data-driven (control-aware ESN) model with real-time data assimilation, the approach converts a POMDP into a practically tractable MDP for learning. The KS equation serves as a challenging testbed, where DA-MIRL achieves robust stabilization with significantly fewer sensors than model-free RL and maintains performance across varying chaotic regimes. The results highlight the practical impact of modular observer-controller design, enabling real-time control of complex chaotic dynamics in scenarios with limited sensing and noisy data. Overall, the framework broadens the applicability of RL-based control to partially observable, high-dimensional chaotic systems, with scalable, open-source implementations.

Abstract

The control of spatio-temporally chaos is challenging because of high dimensionality and unpredictability. Model-free reinforcement learning (RL) discovers optimal control policies by interacting with the system, typically requiring observations of the full physical state. In practice, sensors often provide only partial and noisy measurements (observations) of the system. The objective of this paper is to develop a framework that enables the control of chaotic systems with partial and noisy observability. The proposed method, data-assimilated model-informed reinforcement learning (DA-MIRL), integrates (i) low-order models to approximate high-dimensional dynamics; (ii) sequential data assimilation to correct the model prediction when observations become available; and (iii) an off-policy actor-critic RL algorithm to adaptively learn an optimal control strategy based on the corrected state estimates. We test DA-MIRL on the spatiotemporally chaotic solutions of the Kuramoto-Sivashinsky equation. We estimate the full state of the environment with (i) a physics-based model, here, a coarse-grained model; and (ii) a data-driven model, here, the control-aware echo state network, which is proposed in this paper. We show that DA-MIRL successfully estimates and suppresses the chaotic dynamics of the environment in real time from partial observations and approximate models. This work opens opportunities for the control of partially observable chaotic systems.

Paper Structure

This paper contains 29 sections, 35 equations, 15 figures, 3 tables, 3 algorithms.

Figures (15)

  • Figure 1: Three-block schematic of reinforcement learning methods under partial observability. Comparison between (a) model-free RL with belief state and (b) the proposed data-assimilated model-informed RL.
  • Figure 2: Detailed schematic of the proposed DA-MIRL, which aims to control a partially observed environment (gray). The DA-MIRL integrates three components: model environment (orange), state estimation (blue) and agent (green). The numerical model $\bm{F}$ approximates the environment by forecasting an ensemble of states $\bm{s}_j$ (stacked boxes). If there are no observations, the model runs autonomously, otherwise, the model prediction is updated by the state estimator. The state estimator perturbs the observations $\bm{o}$ and assimilates them with the forecast ensemble via the ensemble Kalman filter (EnKF), which results in the analysis ensemble $\bm{s}^\mathrm{a}_j$. The model $\bm{F}$ is re-initialized with $\bm{s}^\mathrm{a}_j$. The actor-critic agent interacts with both the environment and its model to apply and determine the optimal action at any time $t_k$. The critic, which is active only during training, approximates the $Q$-value function from the state-action pair. The actor (i.e., the policy $\bm{\pi}$) determines the action $\bm{a}$ from the expected value of the full state of the environment. We do not have access to the environment's full state, hence, we feed the expected value given by the model, i.e., the ensemble mean $\bar{\bm{s}}$.
  • Figure 3: Control-aware echo state network. (a) Compact schematic of the architecture, and (b) pictorial forecast of the network with one open-loop followed by two closed-loop steps.
  • Figure 4: Model-free RL: training performance for varying (a) number of sensors; (b) levels of observation error level, $\bm{\sigma}_{o}$; and (c) frequency of observations ($\Delta k_o$). Mean and standard deviation of the performance metrics across 5 different runs: (I) return $R$, (II) critic loss $L_Q$, (III) policy loss $L_\pi$. Default parameters: $n_a=8$, $n_o=64$, $\sigma_o=0.0\%$, $\Delta k_o=1$.
  • Figure 5: Numerical simulations of the KS equation with varying number of Fourier modes using random actuations. Spatio-temporal evolution of the KS equation and reconstruction error using (a) $n^{true}_f = 64$ and (b)-(e) truncated modes $n_f = \{32, 20, 16, 14\}$. (f) Energy spectrum $E(\kappa_l) = \frac{1}{5000}\sum_{k = 1}^{5000}|c_l(k)|^2$, where $l = 1,2,\dots, n_f/2-1$.
  • ...and 10 more figures