Table of Contents
Fetching ...

Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows

Defne E. Ozan, Andrea Nóvoa, Luca Magri

TL;DR

The paper addresses stabilizing chaotic, high-dimensional flows when full state information is unavailable. It introduces a data-assimilated model-based RL framework (DA-MBRL) that fuses a control-aware Echo State Network for learned dynamics, Ensemble Kalman Filter state estimation, and an off-policy actor-critic RL (DDPG) for control. The authors demonstrate that DA-MBRL stabilizes the Kuramoto–Sivashinsky equation under noisy, partial observations, where model-free RL fails at low sensor counts, illustrating improved data efficiency and robustness to partial observability. This framework thus enables real-time control of partially observed chaotic systems and paves the way for applying RL to more complex flow problems.

Abstract

The goal of many applications in energy and transport sectors is to control turbulent flows. However, because of chaotic dynamics and high dimensionality, the control of turbulent flows is exceedingly difficult. Model-free reinforcement learning (RL) methods can discover optimal control policies by interacting with the environment, but they require full state information, which is often unavailable in experimental settings. We propose a data-assimilated model-based RL (DA-MBRL) framework for systems with partial observability and noisy measurements. Our framework employs a control-aware Echo State Network for data-driven prediction of the dynamics, and integrates data assimilation with an Ensemble Kalman Filter for real-time state estimation. An off-policy actor-critic algorithm is employed to learn optimal control strategies from state estimates. The framework is tested on the Kuramoto-Sivashinsky equation, demonstrating its effectiveness in stabilizing a spatiotemporally chaotic flow from noisy and partial measurements.

Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows

TL;DR

The paper addresses stabilizing chaotic, high-dimensional flows when full state information is unavailable. It introduces a data-assimilated model-based RL framework (DA-MBRL) that fuses a control-aware Echo State Network for learned dynamics, Ensemble Kalman Filter state estimation, and an off-policy actor-critic RL (DDPG) for control. The authors demonstrate that DA-MBRL stabilizes the Kuramoto–Sivashinsky equation under noisy, partial observations, where model-free RL fails at low sensor counts, illustrating improved data efficiency and robustness to partial observability. This framework thus enables real-time control of partially observed chaotic systems and paves the way for applying RL to more complex flow problems.

Abstract

The goal of many applications in energy and transport sectors is to control turbulent flows. However, because of chaotic dynamics and high dimensionality, the control of turbulent flows is exceedingly difficult. Model-free reinforcement learning (RL) methods can discover optimal control policies by interacting with the environment, but they require full state information, which is often unavailable in experimental settings. We propose a data-assimilated model-based RL (DA-MBRL) framework for systems with partial observability and noisy measurements. Our framework employs a control-aware Echo State Network for data-driven prediction of the dynamics, and integrates data assimilation with an Ensemble Kalman Filter for real-time state estimation. An off-policy actor-critic algorithm is employed to learn optimal control strategies from state estimates. The framework is tested on the Kuramoto-Sivashinsky equation, demonstrating its effectiveness in stabilizing a spatiotemporally chaotic flow from noisy and partial measurements.

Paper Structure

This paper contains 5 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Training performance of (a) model-free and (b) DA-MBRL on the KS equation with different number of sensors. Shown non-discounted returns of training episodes across 5 runs (mean and 1 standard deviation). Ref. indicates the maximum return achieved by the model-free algorithm with full observability (i.e., $n_o = n_s = 64$).
  • Figure 2: Evaluation episode of the DA-MBRL on the KS equation with 4 sensors. (a) The true state of the system, (b) the estimated state, (c) the reconstruction error, and (d-g) the observations and their ensemble estimates. The triangles in (a,b) indicate the actuator locations. The observations and control begins after 500 time steps.