Table of Contents
Fetching ...

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics

Wang Jia, Hang Xu

TL;DR

This study validates an existing state-of-the-art DRL framework used for AFC problems and conducts extensive scalability benchmarks for individual components, investigating various hybrid parallelization configurations and proposing efficient parallelization strategies.

Abstract

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 central processing unit (CPU) cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies. Consequently, it continues to be a prominent and actively studied problem of significant interest.

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics

TL;DR

This study validates an existing state-of-the-art DRL framework used for AFC problems and conducts extensive scalability benchmarks for individual components, investigating various hybrid parallelization configurations and proposing efficient parallelization strategies.

Abstract

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 central processing unit (CPU) cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies. Consequently, it continues to be a prominent and actively studied problem of significant interest.
Paper Structure (15 sections, 10 equations, 12 figures, 2 tables)

This paper contains 15 sections, 10 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Description of the numerical setup. (a) Computational domain ($22D \times 4.1D$) of flow around a cylinder. (b) Boundary conditions for the computational domain; (c) Details of the boundary conditions on the cylinder. Jets are located at $\Omega = 90^{\circ}$ and $\Omega = 270^{\circ}$ on the cylinder, with a jet width of $\omega = 10^{\circ}$. Parabolic velocity distribution is used for each jet.
  • Figure 2: The agent and the environment are fundamental components of reinforcement learning. The environment is the entity with which the agent interacts. At any given time step $t$, the agent first observes the current state $s_t$ of the environment, along with the corresponding reward value $r_t$. Based on these state and reward information, the agent decides how to take action $a_t$. The agent receives feedback from the environment, obtaining the next time step's state $s_{t+1}$ and reward $r_{t+1}$.
  • Figure 3: Instantaneous velocity field with black dots representing the positions of the 149 probes.
  • Figure 4: Illustration of the process and allocation of computational resources for DRL parallel training. Here, four training environments are utilized, each employing 5 MPI ranks for parallel CFD computations. This configuration requires a total of 20 cores. During the training, the agent continuously interacts with the CFD environments, generating a series of tuples ($s_i^m$, $a_i^m$, $r_i^m$). Here, the superscript $m$ denotes the specific environment, and the subscript $i$ represents the timestep when the agent interacts with that environment. The trajectory $\tau^m$ resulting from the agent-environment interaction is then used to calculate the gradient $\nabla_\theta J(\tau^{(m)})$ and update the neural network parameters $\theta$. DRL parallelizes the training across the four environments, meaning that in each training step, the four environment instances operate simultaneously. This allows for the concurrent collection of multiple samples. Importantly, each environment instance operates independently, creating an isolated setting for training.
  • Figure 5: Results of DRL training.(a) displays the cumulative reward at each episode during the training process. (b), (c), and (d) show the changes in action $a$, lift coefficient $C_L$ and drag coefficient $C_D$, respectively. Histories of each parameter at selected episodes are shown to illustrate the convergence. (e) to (j) correspond to vorticity contours at the end of each selected episode.
  • ...and 7 more figures