Table of Contents
Fetching ...

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

Tim Plotzki, Sebastian Peitz

Abstract

Training reinforcement learning (RL) agents to control fluid dynamics systems is computationally expensive due to the high cost of direct numerical simulations (DNS) of the governing equations. Surrogate models offer a promising alternative by approximating the dynamics at a fraction of the computational cost, but their feasibility as training environments for RL is limited by distribution shifts, as policies induce state distributions not covered by the surrogate training data. In this work, we investigate the use of Linear Recurrent Autoencoder Networks (LRANs) for accelerating RL-based control of 2D Rayleigh-Bénard convection. We evaluate two training strategies: a surrogate trained on precomputed data generated with random actions, and a policy-aware surrogate trained iteratively using data collected from an evolving policy. Our results show that while surrogate-only training leads to reduced control performance, combining surrogates with DNS in a pretraining scheme recovers state-of-the-art performance while reducing training time by more than 40%. We demonstrate that policy-aware training mitigates the effects of distribution shift, enabling more accurate predictions in policy-relevant regions of the state space.

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

Abstract

Training reinforcement learning (RL) agents to control fluid dynamics systems is computationally expensive due to the high cost of direct numerical simulations (DNS) of the governing equations. Surrogate models offer a promising alternative by approximating the dynamics at a fraction of the computational cost, but their feasibility as training environments for RL is limited by distribution shifts, as policies induce state distributions not covered by the surrogate training data. In this work, we investigate the use of Linear Recurrent Autoencoder Networks (LRANs) for accelerating RL-based control of 2D Rayleigh-Bénard convection. We evaluate two training strategies: a surrogate trained on precomputed data generated with random actions, and a policy-aware surrogate trained iteratively using data collected from an evolving policy. Our results show that while surrogate-only training leads to reduced control performance, combining surrogates with DNS in a pretraining scheme recovers state-of-the-art performance while reducing training time by more than 40%. We demonstrate that policy-aware training mitigates the effects of distribution shift, enabling more accurate predictions in policy-relevant regions of the state space.

Paper Structure

This paper contains 14 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Extension of the LRAN architecture to incorporate control actions as additional inputs. The action vector $\mathbf{a}_t$ is added to the next hidden state after an affine transformation defined by an input-to-hidden matrix $U$ and a bias term $\mathbf{b}_u$.
  • Figure 2: Training loop of the policy-aware surrogate training scheme. The surrogate model is optimized with data from DNS using actions from the policy while the policy is optimized by interacting with the surrogate using PPO.
  • Figure 3: Control performance of policies from both surrogates over the course of training.
  • Figure 4: Qualitative comparison of both LRANs. (a) Surrogates predict $200$ steps with zero action input in an initial four-cell RBC state. (b) Surrogates predict one step with zero action input in a two-cell RBC state. Control with zero action inputs is equivalent to uncontrolled forward prediction.