Table of Contents
Fetching ...

Expression Acceleration: Seamless Parallelization of Typed High-Level Languages

Lars Hummelgren, John Wikman, Oscar Eriksson, Philipp Haller, David Broman

TL;DR

The paper presents Expression Acceleration, a compiler-based approach that lets high-level statically typed languages mark expressions for GPU execution. It extracts the accelerated code with dependencies via lambda lifting, classifies each accelerated binding to a backend (Futhark or CUDA), and automatically generates marshaling code to move data between CPU and GPU. The system supports two backends and enforces well-formedness constraints, with dynamic checks in debug mode and static rules for each backend. Evaluation on Futhark and CUDA benchmarks shows competitive performance and substantial speedups, validating the practicality of seamlessly integrating GPU acceleration into existing high-level programs.

Abstract

Efficient parallelization of algorithms on general-purpose GPUs is essential in many areas today. However, it is a non-trivial task for software engineers to utilize GPUs to improve the performance of high-level programs in general. Although many domain-specific approaches are available for GPU acceleration, it is difficult to accelerate existing high-level programs without rewriting parts of the programs using low-level GPU code. We present a compiler implementation using an alternative approach called expression acceleration. This approach marks expressions for acceleration, and the compiler automatically infers which dependent code needs to be accelerated. We design and implement a compiler supporting expression acceleration for a statically typed functional language and evaluate its applicability and performance.

Expression Acceleration: Seamless Parallelization of Typed High-Level Languages

TL;DR

The paper presents Expression Acceleration, a compiler-based approach that lets high-level statically typed languages mark expressions for GPU execution. It extracts the accelerated code with dependencies via lambda lifting, classifies each accelerated binding to a backend (Futhark or CUDA), and automatically generates marshaling code to move data between CPU and GPU. The system supports two backends and enforces well-formedness constraints, with dynamic checks in debug mode and static rules for each backend. Evaluation on Futhark and CUDA benchmarks shows competitive performance and substantial speedups, validating the practicality of seamlessly integrating GPU acceleration into existing high-level programs.

Abstract

Efficient parallelization of algorithms on general-purpose GPUs is essential in many areas today. However, it is a non-trivial task for software engineers to utilize GPUs to improve the performance of high-level programs in general. Although many domain-specific approaches are available for GPU acceleration, it is difficult to accelerate existing high-level programs without rewriting parts of the programs using low-level GPU code. We present a compiler implementation using an alternative approach called expression acceleration. This approach marks expressions for acceleration, and the compiler automatically infers which dependent code needs to be accelerated. We design and implement a compiler supporting expression acceleration for a statically typed functional language and evaluate its applicability and performance.
Paper Structure (22 sections, 8 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 8 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: The recommended workflow when using acceleration, from a source program on the left-hand side to an accelerated binary on the right-hand side. Gray boxes are artifacts and blue boxes are processes.
  • Figure 2: Overview of the pipeline of the accelerate compiler. The gray rectangles represent artifacts, and the blue rounded rectangles represent compiler passes. Note that we omit most intermediate artifacts for brevity. We associate integers with each compiler pass to indicate the order in which it takes place.
  • Figure 3: Input program making use of acceleration. This program is the input to the accelerate extraction.
  • Figure 4: Program of Listing \ref{['lst:lamlift1']} after applying the first step of the accelerate extraction. The accelerated code produced by extraction consists of the highlighted parts of the program.
  • Figure 6: Definition of the base language, a subset of the MExpr language.
  • ...and 7 more figures