Table of Contents
Fetching ...

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

Weicheng Xue, Christohper John Roy

TL;DR

This work addresses autotuning of GPU kernel scheduling in a CFD code by learning a mapping from 14 tuning parameters (gang size and vector length) to runtime using a fully connected neural network. The study evaluates the approach on three GPUs (C2075, P100, V100) with both independent and combined training, demonstrating accurate runtime predictions even with a limited training set (7500 per GPU, 2500 for testing) and substantial reduction in search effort. Combining data across GPUs (and including GPU type as a feature) improves prediction accuracy and generalization, indicating strong potential for cross-hardware autotuning. The results suggest ML-driven autotuning can enhance performance in GPU-accelerated scientific computing and motivate integration into compiler backends and broader HPC workflows, with future work extending to more problems, problem sizes, schemes, and devices, and exploring reinforcement learning approaches.

Abstract

Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations.

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

TL;DR

This work addresses autotuning of GPU kernel scheduling in a CFD code by learning a mapping from 14 tuning parameters (gang size and vector length) to runtime using a fully connected neural network. The study evaluates the approach on three GPUs (C2075, P100, V100) with both independent and combined training, demonstrating accurate runtime predictions even with a limited training set (7500 per GPU, 2500 for testing) and substantial reduction in search effort. Combining data across GPUs (and including GPU type as a feature) improves prediction accuracy and generalization, indicating strong potential for cross-hardware autotuning. The results suggest ML-driven autotuning can enhance performance in GPU-accelerated scientific computing and motivate integration into compiler backends and broader HPC workflows, with future work extending to more problems, problem sizes, schemes, and devices, and exploring reinforcement learning approaches.

Abstract

Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations.
Paper Structure (13 sections, 7 equations, 14 figures, 4 tables)

This paper contains 13 sections, 7 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Multilevel parallelism for GPU.
  • Figure 2: Two-parameter manual tuning for a buoyancy-driven cavity code.
  • Figure 3: Buoyancy driven cavity cases
  • Figure 4: Artificial Neural Network
  • Figure 5: Stencil for MUSCL extrapolation
  • ...and 9 more figures