Table of Contents
Fetching ...

Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning

Michiel Straat, Thorben Markmann, Sebastian Peitz, Barbara Hammer

TL;DR

This work tackles robust control of chaotic Rayleigh-Bénard convection using reinforcement learning. It introduces domain-informed reward shaping that encourages Bénard cell merging, implemented with Proximal Policy Optimization to control bottom heaters based on partial observations. The domain-informed approach yields faster training, better generalization across initial conditions, and stable, reduced-convection states even in highly chaotic regimes, surpassing conventional linear controllers and uninformed RL. The findings suggest that incorporating physics-informed priors is essential for practical deployment of RL-based flow control, with potential extensions to 3D turbulent convection and surrogate-model training to further reduce sample needs.

Abstract

Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particularly in chaotic regimes where conventional control methods often fail. Reinforcement Learning (RL) has shown promise for control in laminar flow settings, but its ability to generalize and remain robust under chaotic and turbulent dynamics is not well explored, despite being critical for real-world deployment. In this work, we improve the practical feasibility of RL-based control of such flows focusing on Rayleigh-Bénard Convection (RBC), a canonical model for convective heat transport. To enhance generalization and sample efficiency, we introduce domain-informed RL agents that are trained using Proximal Policy Optimization across diverse initial conditions and flow regimes. We incorporate domain knowledge in the reward function via a term that encourages Bénard cell merging, as an example of a desirable macroscopic property. In laminar flow regimes, the domain-informed RL agents reduce convective heat transport by up to 33%, and in chaotic flow regimes, they still achieve a 10% reduction, which is significantly better than the conventional controllers used in practice. We compare the domain-informed to uninformed agents: Our results show that the domain-informed reward design results in steady flows, faster convergence during training, and generalization across flow regimes without retraining. Our work demonstrates that elegant domain-informed priors can greatly enhance the robustness of RL-based control of chaotic flows, bringing real-world deployment closer.

Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning

TL;DR

This work tackles robust control of chaotic Rayleigh-Bénard convection using reinforcement learning. It introduces domain-informed reward shaping that encourages Bénard cell merging, implemented with Proximal Policy Optimization to control bottom heaters based on partial observations. The domain-informed approach yields faster training, better generalization across initial conditions, and stable, reduced-convection states even in highly chaotic regimes, surpassing conventional linear controllers and uninformed RL. The findings suggest that incorporating physics-informed priors is essential for practical deployment of RL-based flow control, with potential extensions to 3D turbulent convection and surrogate-model training to further reduce sample needs.

Abstract

Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particularly in chaotic regimes where conventional control methods often fail. Reinforcement Learning (RL) has shown promise for control in laminar flow settings, but its ability to generalize and remain robust under chaotic and turbulent dynamics is not well explored, despite being critical for real-world deployment. In this work, we improve the practical feasibility of RL-based control of such flows focusing on Rayleigh-Bénard Convection (RBC), a canonical model for convective heat transport. To enhance generalization and sample efficiency, we introduce domain-informed RL agents that are trained using Proximal Policy Optimization across diverse initial conditions and flow regimes. We incorporate domain knowledge in the reward function via a term that encourages Bénard cell merging, as an example of a desirable macroscopic property. In laminar flow regimes, the domain-informed RL agents reduce convective heat transport by up to 33%, and in chaotic flow regimes, they still achieve a 10% reduction, which is significantly better than the conventional controllers used in practice. We compare the domain-informed to uninformed agents: Our results show that the domain-informed reward design results in steady flows, faster convergence during training, and generalization across flow regimes without retraining. Our work demonstrates that elegant domain-informed priors can greatly enhance the robustness of RL-based control of chaotic flows, bringing real-world deployment closer.

Paper Structure

This paper contains 16 sections, 7 equations, 5 figures.

Figures (5)

  • Figure 1: Figure (a) shows the schematic of the control setup. The RL agent receives partial observations from an 8x48 grid of sensors and a midline vertical velocity measurement. Based on the observation $\bm{s}(t)$, the policy network outputs heating actions $\bm{a}(t)$ for 12 bottom actuators. Note in PPO, there is also a critic network involved that estimates the value function. The reward combines a normalized Nusselt number reduction term with a domain-informed term that encourages cell merging via the measured cell distance. Figures (b) and (d) show a state resulting from $Ra=10^4$ together with the Nusselt number over time. Figures (c) and (e) show the same for $Ra=10^6$.
  • Figure 2: Flow control by an uninformed agent for a typical test set episode at $\text{Ra} = 10^4$.
  • Figure 3: The relative reduction of the Nusselt number with respect to the uncontrolled baseline for each control method on each Ra. For each test checkpoint we computed the mean Nusselt number over the entire episode and then compute the percentage change relative to the mean Nusselt number of the uncontrolled baseline. Here, we show the average and standard deviation computed over the 12 test checkpoints.
  • Figure 4: The effect of Domain-Informed (DI) training vs. uninformed training (No DI) on flow control shown by three key statistics computed on the test set. (a): the percentage of episodes where Bénard cells were merged. (b): The standard deviation of the Nusselt number over time, computed over the last 40 actions in the episode. (c): The simulation time at which the cell merging event took place. (d) The Nusselt number during a typical example test episode.
  • Figure 5: The effect of domain-informed training of an agent trained on $Ra=10^5$ on generalization across other $Ra$ as well, in three key statistics computed on the test set. See Fig. \ref{['fig:exp3:stats']} for an explanation of the statistics shown in (a) and (b). Fig. (c) and (d): The Nusselt number during a typical example test episode for $Ra=10^4$ and $Ra=10^6$, respectively.