Table of Contents
Fetching ...

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

TL;DR

This paper proposes equality saturation to optimize sequential codes utilized in directive-based programming for GPUs, and proposes a fully-automated framework that realizes less computation, less memory access, and high memory throughput simultaneously.

Abstract

Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach realizes less computation, less memory access, and high memory throughput simultaneously. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs.

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

TL;DR

This paper proposes equality saturation to optimize sequential codes utilized in directive-based programming for GPUs, and proposes a fully-automated framework that realizes less computation, less memory access, and high memory throughput simultaneously.

Abstract

Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach realizes less computation, less memory access, and high memory throughput simultaneously. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs.
Paper Structure (20 sections, 6 figures, 4 tables)

This paper contains 20 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of ACC Saturator
  • Figure 2: NPB's speedup results on NVIDIA A100-PCIE-40GB for each variation compared to original. $\squadfill$ NVHPC, $\squadfill$ GCC.
  • Figure 3: Breakdown of NPB-BT; The background color depicts the cumulative ratio of the execution time along the speedup points for each kernel. $\circletfill$CSE, $\circletfill$CSE+SAT, $\pentagofill$CSE+BULK, $\pentagofill$ACCSAT (CSE+SAT+BULK).
  • Figure 4: Speedup results of the SPEC ACCEL benchmark suite on NVIDIA A100-PCIE-40GB. $\squadfill$ NVHPC, $\squadfill$ GCC, $\squadfill$ Clang.
  • Figure 5: NPB's speedup results on NVIDIA A100-SXM4-80GB. $\squadfill$ NVHPC, $\squadfill$ GCC.
  • ...and 1 more figures