Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning
Michiel Straat, Thorben Markmann, Sebastian Peitz, Barbara Hammer
TL;DR
This work tackles robust control of chaotic Rayleigh-Bénard convection using reinforcement learning. It introduces domain-informed reward shaping that encourages Bénard cell merging, implemented with Proximal Policy Optimization to control bottom heaters based on partial observations. The domain-informed approach yields faster training, better generalization across initial conditions, and stable, reduced-convection states even in highly chaotic regimes, surpassing conventional linear controllers and uninformed RL. The findings suggest that incorporating physics-informed priors is essential for practical deployment of RL-based flow control, with potential extensions to 3D turbulent convection and surrogate-model training to further reduce sample needs.
Abstract
Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particularly in chaotic regimes where conventional control methods often fail. Reinforcement Learning (RL) has shown promise for control in laminar flow settings, but its ability to generalize and remain robust under chaotic and turbulent dynamics is not well explored, despite being critical for real-world deployment. In this work, we improve the practical feasibility of RL-based control of such flows focusing on Rayleigh-Bénard Convection (RBC), a canonical model for convective heat transport. To enhance generalization and sample efficiency, we introduce domain-informed RL agents that are trained using Proximal Policy Optimization across diverse initial conditions and flow regimes. We incorporate domain knowledge in the reward function via a term that encourages Bénard cell merging, as an example of a desirable macroscopic property. In laminar flow regimes, the domain-informed RL agents reduce convective heat transport by up to 33%, and in chaotic flow regimes, they still achieve a 10% reduction, which is significantly better than the conventional controllers used in practice. We compare the domain-informed to uninformed agents: Our results show that the domain-informed reward design results in steady flows, faster convergence during training, and generalization across flow regimes without retraining. Our work demonstrates that elegant domain-informed priors can greatly enhance the robustness of RL-based control of chaotic flows, bringing real-world deployment closer.
