PIVOT- Input-aware Path Selection for Energy-efficient ViT Inference
Abhishek Moitra, Abhiroop Bhattacharjee, Priyadarshini Panda
TL;DR
PIVOT tackles the high latency and energy cost of ViT attention by making attention activation input-aware. It uses entropy to route inputs between a low-effort and a high-effort ViT, and employs a two-phase hardware-in-the-loop search with a CKA-based Path-Score to select optimal attention configurations, all validated via a cycle-accurate PIVOT-Sim on FPGA and across CPUs/GPUs. The approach delivers substantial reductions in energy-delay-product (EDP) with minimal accuracy loss (e.g., ~2.7x EDP reduction at ~0.2% accuracy loss on LVViT-S) and outperforms prior token pruning and sparsification methods, while remaining general-purpose and open-source. Overall, PIVOT enables efficient, input-aware ViT inference without requiring specialized hardware, facilitating practical deployment across diverse computing platforms.
Abstract
The attention module in vision transformers(ViTs) performs intricate spatial correlations, contributing significantly to accuracy and delay. It is thereby important to modulate the number of attentions according to the input feature complexity for optimal delay-accuracy tradeoffs. To this end, we propose PIVOT - a co-optimization framework which selectively performs attention skipping based on the input difficulty. For this, PIVOT employs a hardware-in-loop co-search to obtain optimal attention skip configurations. Evaluations on the ZCU102 MPSoC FPGA show that PIVOT achieves 2.7x lower EDP at 0.2% accuracy reduction compared to LVViT-S ViT. PIVOT also achieves 1.3% and 1.8x higher accuracy and throughput than prior works on traditional CPUs and GPUs. The PIVOT project can be found at https://github.com/Intelligent-Computing-Lab-Yale/PIVOT.
