Table of Contents
Fetching ...

Model-based deep reinforcement learning for accelerated learning from flow simulations

Andre Weiner, Janis Geise

TL;DR

This work addresses the high computational cost of reinforcement-learning-based active flow control by proposing model-based DRL with an ensemble of environment models (MEPPO). By alternating between high-fidelity CFD trajectories and model-simulated trajectories, the approach achieves substantial training-time reductions (e.g., >$65\%$ for cylinder and >$80\%$ for pinball) while delivering comparable or superior control performance. The two benchmark flows, cylinder wake control and the fluidic pinball, demonstrate that model ensembles provide robustness against model error and enable efficient exploration via coordinated sampling. The findings suggest that model-based RL with ensembles can enable practical, data-efficient DRL-based AFC for more complex, industrial-scale CFD problems.

Abstract

In recent years, deep reinforcement learning has emerged as a technique to solve closed-loop flow control problems. Employing simulation-based environments in reinforcement learning enables a priori end-to-end optimization of the control system, provides a virtual testbed for safety-critical control applications, and allows to gain a deep understanding of the control mechanisms. While reinforcement learning has been applied successfully in a number of rather simple flow control benchmarks, a major bottleneck toward real-world applications is the high computational cost and turnaround time of flow simulations. In this contribution, we demonstrate the benefits of model-based reinforcement learning for flow control applications. Specifically, we optimize the policy by alternating between trajectories sampled from flow simulations and trajectories sampled from an ensemble of environment models. The model-based learning reduces the overall training time by up to $85\%$ for the fluidic pinball test case. Even larger savings are expected for more demanding flow simulations.

Model-based deep reinforcement learning for accelerated learning from flow simulations

TL;DR

This work addresses the high computational cost of reinforcement-learning-based active flow control by proposing model-based DRL with an ensemble of environment models (MEPPO). By alternating between high-fidelity CFD trajectories and model-simulated trajectories, the approach achieves substantial training-time reductions (e.g., > for cylinder and > for pinball) while delivering comparable or superior control performance. The two benchmark flows, cylinder wake control and the fluidic pinball, demonstrate that model ensembles provide robustness against model error and enable efficient exploration via coordinated sampling. The findings suggest that model-based RL with ensembles can enable practical, data-efficient DRL-based AFC for more complex, industrial-scale CFD problems.

Abstract

In recent years, deep reinforcement learning has emerged as a technique to solve closed-loop flow control problems. Employing simulation-based environments in reinforcement learning enables a priori end-to-end optimization of the control system, provides a virtual testbed for safety-critical control applications, and allows to gain a deep understanding of the control mechanisms. While reinforcement learning has been applied successfully in a number of rather simple flow control benchmarks, a major bottleneck toward real-world applications is the high computational cost and turnaround time of flow simulations. In this contribution, we demonstrate the benefits of model-based reinforcement learning for flow control applications. Specifically, we optimize the policy by alternating between trajectories sampled from flow simulations and trajectories sampled from an ensemble of environment models. The model-based learning reduces the overall training time by up to for the fluidic pinball test case. Even larger savings are expected for more demanding flow simulations.
Paper Structure (21 sections, 15 equations, 12 figures, 4 tables, 3 algorithms)

This paper contains 21 sections, 15 equations, 12 figures, 4 tables, 3 algorithms.

Figures (12)

  • Figure 1: High-level overview of one model-based PPO episode.
  • Figure 2: AFC setup of the flow past a cylinder; based on schaefer1996rabault2019tokarev2020.
  • Figure 3: AFC setup of the fluidic pinball; based on noack2016.
  • Figure 4: Cylinder flow: episode-wise mean reward $R$ for different training configurations; the shaded area encloses one standard deviation below and above the mean; mean and standard deviation are computed over all trajectories of all seeds; for the MB training, the markers indicate CFD-based trajectory sampling.
  • Figure 5: Cylinder flow: composition of the total MB training time $T_\mathrm{MB}$ normalized with the MF training time $T_\mathrm{MF}\approx 14h$.
  • ...and 7 more figures