Table of Contents
Fetching ...

EA4RCA:Efficient AIE accelerator design framework for Regular Communication-Avoiding Algorithm

W. B. Zhang, Y. Q. Liu, T. H. Zang, Z. S. Bao

TL;DR

The paper addresses underutilization of AIE resources for regular communication-avoiding algorithms on Versal ACAP by introducing EA4RCA, a top-down accelerator design framework paired with an AIE Graph Code Generator. It decouples computation and communication through RCA-driven dataflow patterns, enabling flexible computing and data engines that scale with many AIE cores. Empirical results on MM, Filter2D, FFT, and MM-T show substantial throughput and energy-efficiency improvements over state-of-the-art designs, with peak performance reaching the multi-thousand GOPS range and energy efficiency surpassing prior solutions. The work demonstrates the practicality of RCA-focused AIE accelerator deployment, highlighting improved development efficiency and resource utilization on the VCK5000 platform, and charts paths for broader automated deployment of AIE-based accelerators.

Abstract

With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute Acceleration Platform (Versal ACAP) has garnered great attention. However, the current focus of Vitis Libraries and limited research has mainly been on how to invoke AIE modules, without delving into a thorough discussion on effectively utilizing AIE in its typical use cases. As a result, the widespread adoption of Versal ACAP has been restricted. The Communication Avoidance (CA) algorithm is considered a typical application within the AIE architecture. Nevertheless, the effective utilization of AIE in CA applications remains an area that requires further exploration. We propose a top-down customized design framework, EA4RCA(Efficient AIE accelerator design framework for regular Communication-Avoid Algorithm), specifically tailored for CA algorithms with regular communication patterns, and equipped with AIE Graph Code Generator software to accelerate the AIE design process. The primary objective of this framework is to maximize the performance of AIE while incorporating high-speed data streaming services. Experiments show that for the RCA algorithm Filter2D and Matrix Multiple (MM) with lower communication requirements and the RCA algorithm FFT with higher communication requirements, the accelerators implemented by the RA4RCA framework achieve the highest throughput improvements of 22.19x, 1.05x and 3.88x compared with the current highest performance acceleration scheme (SOTA), and the highest energy efficiency improvements of 6.11x, 1.30x and 7.00x.

EA4RCA:Efficient AIE accelerator design framework for Regular Communication-Avoiding Algorithm

TL;DR

The paper addresses underutilization of AIE resources for regular communication-avoiding algorithms on Versal ACAP by introducing EA4RCA, a top-down accelerator design framework paired with an AIE Graph Code Generator. It decouples computation and communication through RCA-driven dataflow patterns, enabling flexible computing and data engines that scale with many AIE cores. Empirical results on MM, Filter2D, FFT, and MM-T show substantial throughput and energy-efficiency improvements over state-of-the-art designs, with peak performance reaching the multi-thousand GOPS range and energy efficiency surpassing prior solutions. The work demonstrates the practicality of RCA-focused AIE accelerator deployment, highlighting improved development efficiency and resource utilization on the VCK5000 platform, and charts paths for broader automated deployment of AIE-based accelerators.

Abstract

With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute Acceleration Platform (Versal ACAP) has garnered great attention. However, the current focus of Vitis Libraries and limited research has mainly been on how to invoke AIE modules, without delving into a thorough discussion on effectively utilizing AIE in its typical use cases. As a result, the widespread adoption of Versal ACAP has been restricted. The Communication Avoidance (CA) algorithm is considered a typical application within the AIE architecture. Nevertheless, the effective utilization of AIE in CA applications remains an area that requires further exploration. We propose a top-down customized design framework, EA4RCA(Efficient AIE accelerator design framework for regular Communication-Avoid Algorithm), specifically tailored for CA algorithms with regular communication patterns, and equipped with AIE Graph Code Generator software to accelerate the AIE design process. The primary objective of this framework is to maximize the performance of AIE while incorporating high-speed data streaming services. Experiments show that for the RCA algorithm Filter2D and Matrix Multiple (MM) with lower communication requirements and the RCA algorithm FFT with higher communication requirements, the accelerators implemented by the RA4RCA framework achieve the highest throughput improvements of 22.19x, 1.05x and 3.88x compared with the current highest performance acceleration scheme (SOTA), and the highest energy efficiency improvements of 6.11x, 1.30x and 7.00x.
Paper Structure (17 sections, 2 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 17 sections, 2 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: EA4RCA framework architecture.
  • Figure 2: EA4RCA framework running process.
  • Figure 3: Processing unit architecture.
  • Figure 4: Task processing component structure.
  • Figure 5: SSC four service mode structure and service timing.
  • ...and 2 more figures