Table of Contents
Fetching ...

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Nicolas Bohm Agostini, Jude Haris, Perry Gibson, Malith Jayaweera, Norm Rubin, Antonino Tumeo, José L. Abellán, José Cano, David Kaeli

TL;DR

AXI4MLIR is introduced, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code and empowers users to specify accelerator features and communication patterns and exploit the host memory hierarchy.

Abstract

This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototyping, little attention has been paid to the host-accelerator interaction. This paper introduces AXI4MLIR, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code. With new MLIR attributes and transformations, AXI4MLIR empowers users to specify accelerator features (including their instructions) and communication patterns and exploit the host memory hierarchy. We demonstrate AXI4MLIR's versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65x speedup compared to manually optimized driver code implementations. AXI4MLIR implementation is open-source and available at: https://github.com/AXI4MLIR/axi4mlir.

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

TL;DR

AXI4MLIR is introduced, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code and empowers users to specify accelerator features and communication patterns and exploit the host memory hierarchy.

Abstract

This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototyping, little attention has been paid to the host-accelerator interaction. This paper introduces AXI4MLIR, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code. With new MLIR attributes and transformations, AXI4MLIR empowers users to specify accelerator features (including their instructions) and communication patterns and exploit the host memory hierarchy. We demonstrate AXI4MLIR's versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65x speedup compared to manually optimized driver code implementations. AXI4MLIR implementation is open-source and available at: https://github.com/AXI4MLIR/axi4mlir.
Paper Structure (21 sections, 17 figures, 1 table)

This paper contains 21 sections, 17 figures, 1 table.

Figures (17)

  • Figure 1: Typical host-accelerator system design, highlighting (blue color) relevant parameters that should be considered for efficient generation of host-accelerator communication code.
  • Figure 2: MLIR representations of a Matrix-Matrix Multiplication Operation in different abstractions.1
  • Figure 3: The underlying data structure of a rank==N MLIR buffer.
  • Figure 4: AXI4MLIR Compiler Flow. The numbered elements are the contributions of this work.
  • Figure 5: Accelerator and CPU configuration file.
  • ...and 12 more figures