Table of Contents
Fetching ...

Developing a BLAS library for the AMD AI Engine

Tristan Laan, Tiziano De Matteis

TL;DR

The ongoing project AIEBLAS is presented, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine.

Abstract

Spatial (dataflow) computer architectures can mitigate the control and performance overhead of classical von Neumann architectures such as traditional CPUs. Driven by the popularity of Machine Learning (ML) workloads, spatial devices are being marketed as ML inference accelerators. Despite providing a rich software ecosystem for ML practitioners, their adoption in other scientific domains is hindered by the steep learning curve and lack of reusable software, which makes them inaccessible to non-experts. We present our ongoing project AIEBLAS, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine. Numerical routines are designed to be easily reusable, customized, and composed in dataflow programs, leveraging the characteristics of the targeted device without requiring the user to deeply understand the underlying hardware and programming model.

Developing a BLAS library for the AMD AI Engine

TL;DR

The ongoing project AIEBLAS is presented, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine.

Abstract

Spatial (dataflow) computer architectures can mitigate the control and performance overhead of classical von Neumann architectures such as traditional CPUs. Driven by the popularity of Machine Learning (ML) workloads, spatial devices are being marketed as ML inference accelerators. Despite providing a rich software ecosystem for ML practitioners, their adoption in other scientific domains is hindered by the steep learning curve and lack of reusable software, which makes them inaccessible to non-experts. We present our ongoing project AIEBLAS, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine. Numerical routines are designed to be easily reusable, customized, and composed in dataflow programs, leveraging the characteristics of the targeted device without requiring the user to deeply understand the underlying hardware and programming model.
Paper Structure (5 sections, 3 figures)

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: aieblas development workflow.
  • Figure 2: Overview of the AMD Versal ACAP Architecture.
  • Figure 3: aieblas evaluation results for different input sizes. We considered implementation with data stored on off-chip memory and movers in programmable logic (PL), and with data being synthetically generated on the AIE array (no PL). For axpydot, we considered the dataflow (w/ DF) and no-dataflow (w/o DF) implementations.