Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA
Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima
TL;DR
The paper tackles the energy efficiency challenge of running Whisper ASR on edge devices by implementing Whisper's core dot-product kernel on the IMAX CGLA accelerator. Through hardware/software co-design, an FPGA prototype is used and a 28 nm ASIC projection demonstrates superior energy efficiency (PDP) relative to Jetson AGX Orin and RTX 4090, especially with Q8_0 quantization. The work introduces FP16 and Q8_0 kernels, data-handling optimizations, and an optimal 32 KB LMM configuration to maximize kernel coverage while minimizing static power, achieving a compute-bound realization on IMAX. This study establishes CGRA-like IMAX as a viable, energy-efficient platform for ASR at the edge and outlines directions for scaling to larger Whisper models.
Abstract
The rise of generative AI for tasks like Automatic Speech Recognition (ASR) has created a critical energy consumption challenge. While ASICs offer high efficiency, they lack the programmability to adapt to evolving algorithms. To address this trade-off, we implement and evaluate Whisper's core computational kernel on the IMAX, a general-purpose Coarse-Grained Linear Arrays (CGLAs) accelerator. To our knowledge, this is the first work to execute a Whisper kernel on a CGRA and compare its performance against CPUs and GPUs. Using hardware/software co-design, we evaluate our system via an FPGA prototype and project performance for a 28 nm ASIC. Our results demonstrate superior energy efficiency. The projected ASIC is 1.90x more energy-efficient than the NVIDIA Jetson AGX Orin and 9.83x more than an NVIDIA RTX 4090 for the Q8_0 model. This work positions CGLA as a promising platform for sustainable ASR on power-constrained edge devices.
