Table of Contents
Fetching ...

Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection

Joel Vasanth, Jean Rabault, Francisco Alcántara-Ávila, Mikael Mortensen, Ricardo Vinuesa

TL;DR

This work presents, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-B\'enard convection (RBC), and demonstrates that the invariant property of MARL allows direct transfer of the learnt policy.

Abstract

Deep reinforcement learning (DRL) has found application in numerous use-cases pertaining to flow control. Multi-agent RL (MARL), a variant of DRL, has shown to be more effective than single-agent RL in controlling flows exhibiting locality and translational invariance. We present, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-Bénard convection (RBC). Control is executed by modifying the temperature distribution along the bottom wall divided into multiple control segments, each of which acts as an independent agent. Two regimes of RBC are considered at Rayleigh numbers $\mathrm{Ra}=500$ and $750$. Evaluation of the learned control policy reveals a reduction in convection intensity by $23.5\%$ and $8.7\%$ at $\mathrm{Ra}=500$ and $750$, respectively. The MARL controller converts irregularly shaped convective patterns to regular straight rolls with lower convection that resemble flow in a relatively more stable regime. We draw comparisons with proportional control at both $\mathrm{Ra}$ and show that MARL is able to outperform the proportional controller. The learned control strategy is complex, featuring different non-linear segment-wise actuator delays and actuation magnitudes. We also perform successful evaluations on a larger domain than used for training, demonstrating that the invariant property of MARL allows direct transfer of the learnt policy.

Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-Bénard convection

TL;DR

This work presents, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-B\'enard convection (RBC), and demonstrates that the invariant property of MARL allows direct transfer of the learnt policy.

Abstract

Deep reinforcement learning (DRL) has found application in numerous use-cases pertaining to flow control. Multi-agent RL (MARL), a variant of DRL, has shown to be more effective than single-agent RL in controlling flows exhibiting locality and translational invariance. We present, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-Bénard convection (RBC). Control is executed by modifying the temperature distribution along the bottom wall divided into multiple control segments, each of which acts as an independent agent. Two regimes of RBC are considered at Rayleigh numbers and . Evaluation of the learned control policy reveals a reduction in convection intensity by and at and , respectively. The MARL controller converts irregularly shaped convective patterns to regular straight rolls with lower convection that resemble flow in a relatively more stable regime. We draw comparisons with proportional control at both and show that MARL is able to outperform the proportional controller. The learned control strategy is complex, featuring different non-linear segment-wise actuator delays and actuation magnitudes. We also perform successful evaluations on a larger domain than used for training, demonstrating that the invariant property of MARL allows direct transfer of the learnt policy.
Paper Structure (19 sections, 12 equations, 22 figures, 1 table)

This paper contains 19 sections, 12 equations, 22 figures, 1 table.

Figures (22)

  • Figure 1: Illustration of the computational domain used in the simulation of the RBC system with its dimensions. Shown is the instantaneous temperature field along with convection rolls that form in the system as a result of the RB instability.
  • Figure 2: An illustration of the MARL setup. The environment consists of the RBC system (top) with a projection of the bottom wall boundary (below) also shown. The bottom boundary is divided into $\mathrm{N}_{\mathrm{ag}}$ square segments where temperature control actuations are applied. Observation probes are distributed uniformly throughout the domain, but for illustration purposes are depicted with black dots in blocks of the domain directly above four pseudo-environments. Each pseudo-environment along with their streams of actions, states and rewards form multiple agents. The index $j$ represents the $j^{\mathrm{th}}$ agent, with $j=1,\dots, \mathrm{N}_{\mathrm{ag}}$. All agents share the same neural network parameterisation in the PPO algorithm. A schematic of two networks are shown representing the actor and critic parameterisation.
  • Figure 3: Baseline simulations showing the evolution of $\mathrm{Nu}$ with time $t$, from the purely conductive no-motion state ($\mathrm{Nu}{}=1$) to the onset of instability ($\mathrm{Nu}{}>1$) for (a) $\mathrm{Ra}=500$ and (b) five simulation runs labelled 1--5 for $\mathrm{Ra}=750$ categorized into three classes of baselines.
  • Figure 4: Temperature fields of the baselines shown in the mid-plane cross-section ($y=0$) of the domain for (a) $\mathrm{Ra}=500$ at $t=400$ and (b) for $\mathrm{Ra}=750$ for class 3 (run 5) at time $t=5000$. Temperature fields of runs 1--4 of the other classes of baselines for $\mathrm{Ra}=750$ are shown in Appendix \ref{['sec:appendix_convection_structure']}. Video files are provided in Online Resource 1 and 2 for $\mathrm{Ra}=500$ and 750 respectively (see Appendix \ref{['sec:appendix_supplementary_information']}).
  • Figure 5: Training curves for (a) $\mathrm{Ra}=500$ and (b) $\mathrm{Ra}=750$. The episodes on the horizontal axis are CFD episodes. The dashed horizontal lines correspond to $\mathrm{Nu}_{\mathrm{ref}}$ from the baseline. Grey lines are episode-wise $\mathrm{Nu}$ while black lines are for the moving average of $\mathrm{Nu}$ over a window of 20 episodes.
  • ...and 17 more figures