Model-based deep reinforcement learning for accelerated learning from flow simulations
Andre Weiner, Janis Geise
TL;DR
This work addresses the high computational cost of reinforcement-learning-based active flow control by proposing model-based DRL with an ensemble of environment models (MEPPO). By alternating between high-fidelity CFD trajectories and model-simulated trajectories, the approach achieves substantial training-time reductions (e.g., >$65\%$ for cylinder and >$80\%$ for pinball) while delivering comparable or superior control performance. The two benchmark flows, cylinder wake control and the fluidic pinball, demonstrate that model ensembles provide robustness against model error and enable efficient exploration via coordinated sampling. The findings suggest that model-based RL with ensembles can enable practical, data-efficient DRL-based AFC for more complex, industrial-scale CFD problems.
Abstract
In recent years, deep reinforcement learning has emerged as a technique to solve closed-loop flow control problems. Employing simulation-based environments in reinforcement learning enables a priori end-to-end optimization of the control system, provides a virtual testbed for safety-critical control applications, and allows to gain a deep understanding of the control mechanisms. While reinforcement learning has been applied successfully in a number of rather simple flow control benchmarks, a major bottleneck toward real-world applications is the high computational cost and turnaround time of flow simulations. In this contribution, we demonstrate the benefits of model-based reinforcement learning for flow control applications. Specifically, we optimize the policy by alternating between trajectories sampled from flow simulations and trajectories sampled from an ensemble of environment models. The model-based learning reduces the overall training time by up to $85\%$ for the fluidic pinball test case. Even larger savings are expected for more demanding flow simulations.
