An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
Haikuo Shao, Huihong Shi, Wendong Mao, Zhongfeng Wang
TL;DR
This work tackles the challenge of accelerating EfficientViT, a Convolution-Transformer hybrid ViT, on resource-limited devices by designing a reconfigurable FPGA accelerator. The architecture combines a Reconfigurable Processing Element and a MAT engine with a time-multiplexed, pipelined dataflow to fuse computations across convolution blocks and the lightweight MSA. It supports DWConvs, PWConvs, and MatMuls, and introduces intra- and inter-layer fusion to reduce off-chip data transfers. Implemented on a Xilinx ZCU102 at $200$MHz, it achieves up to $780.2$ GOPS throughput and $105.1$ GOPS/W energy efficiency, outperforming CPU- and other accelerator-based baselines and advancing practical EfficientViT deployment.
Abstract
Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, existing accelerators cannot fully exploit the hardware benefits of EfficientViT due to its unique architecture. In this paper, we propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs. Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention, boosting hardware utilization. Additionally, we present a time-multiplexed and pipelined dataflow to facilitate both intra- and inter-layer fusions, reducing off-chip data access costs. Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz on the Xilinx ZCU102 FPGA, which significantly outperforms prior works.
