Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators
Yonggan Fu, Yongan Zhang, Yang Zhang, David Cox, Yingyan Celine Lin
TL;DR
This work tackles the problem of jointly optimizing networks, mixed-precision (bitwidths), and accelerators to maximize DNN performance. It introduces Auto-NBA, a bi-level optimization framework that combines two key innovations: heterogeneous sampling for scalable, unbiased network-precision search and a differentiable accelerator search engine that operates over a general chunk-based hardware template. Empirical results across CIFAR and ImageNet on FPGA and ASIC platforms show Auto-NBA achieves substantially faster search and superior accuracy–throughput/EDP trade-offs compared with state-of-the-art co-search, one-shot NAS, and hardware-aware NAS baselines. The proposed method provides a scalable, generic tool to accelerate DNN accelerator development and contributes practical insights into joint design of networks, precision, and hardware.
Abstract
While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i.e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search. To tackle these daunting challenges towards optimal and fast development of DNN accelerators, we propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators, by efficiently localizing the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA integrates a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and a novel joint-search pipeline equipped with a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.
