Table of Contents
Fetching ...

A reinforcement learning strategy to automate and accelerate h/p-multigrid solvers

David Huergo, Laura Alonso, Saumitra Joshi, Adrian Juanicoteca, Gonzalo Rubio, Esteban Ferrer

TL;DR

The paper addresses accelerating high-order Flux Reconstruction-based solvers by automatically tuning hp-multigrid parameters with Proximal Policy Optimization (PPO). By treating the FR solver as an RL environment, the PPO agent dynamically selects pre-/post-smoothing sweeps and the correction fraction across p-levels to minimize the residual while reducing runtime. Results show substantial speedups and improved robustness on 1D advection-diffusion and Burgers' equations, with best gains (up to >100x) on nonuniform meshes when trained in an $h/p$-multigrid setting; cross-configuration transfer is possible but sensitive to the training context. This work demonstrates that RL can automate multigrid control in high-order methods, potentially enabling scalable, adaptive solvers for more complex geometries and discretizations.

Abstract

We explore a reinforcement learning strategy to automate and accelerate h/p-multigrid methods in high-order solvers. Multigrid methods are very efficient but require fine-tuning of numerical parameters, such as the number of smoothing sweeps per level and the correction fraction (i.e., proportion of the corrected solution that is transferred from a coarser grid to a finer grid). The objective of this paper is to use a proximal policy optimization algorithm to automatically tune the multigrid parameters and, by doing so, improve stability and efficiency of the h/p-multigrid strategy. Our findings reveal that the proposed reinforcement learning h/p-multigrid approach significantly accelerates and improves the robustness of steady-state simulations for one dimensional advection-diffusion and nonlinear Burgers' equations, when discretized using high-order h/p methods, on uniform and nonuniform grids.

A reinforcement learning strategy to automate and accelerate h/p-multigrid solvers

TL;DR

The paper addresses accelerating high-order Flux Reconstruction-based solvers by automatically tuning hp-multigrid parameters with Proximal Policy Optimization (PPO). By treating the FR solver as an RL environment, the PPO agent dynamically selects pre-/post-smoothing sweeps and the correction fraction across p-levels to minimize the residual while reducing runtime. Results show substantial speedups and improved robustness on 1D advection-diffusion and Burgers' equations, with best gains (up to >100x) on nonuniform meshes when trained in an -multigrid setting; cross-configuration transfer is possible but sensitive to the training context. This work demonstrates that RL can automate multigrid control in high-order methods, potentially enabling scalable, adaptive solvers for more complex geometries and discretizations.

Abstract

We explore a reinforcement learning strategy to automate and accelerate h/p-multigrid methods in high-order solvers. Multigrid methods are very efficient but require fine-tuning of numerical parameters, such as the number of smoothing sweeps per level and the correction fraction (i.e., proportion of the corrected solution that is transferred from a coarser grid to a finer grid). The objective of this paper is to use a proximal policy optimization algorithm to automatically tune the multigrid parameters and, by doing so, improve stability and efficiency of the h/p-multigrid strategy. Our findings reveal that the proposed reinforcement learning h/p-multigrid approach significantly accelerates and improves the robustness of steady-state simulations for one dimensional advection-diffusion and nonlinear Burgers' equations, when discretized using high-order h/p methods, on uniform and nonuniform grids.
Paper Structure (10 sections, 14 equations, 2 figures, 2 tables)

This paper contains 10 sections, 14 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Scheme of a V-cycle (left) of a one level h-multigrid (center) and p-multigrid (right) Antonietti2018. The number of degrees of freedom are represented by dots.
  • Figure 2: Actor (left) and critic (right) neural networks architectures used in the implemented PPO. Each box shows the type of layer, followed by the number of neurons or the proportion of dropout in parentheses. The arrows show the direction in which the information is sent. ReLU is used as the activation function for all the layers.