Table of Contents
Fetching ...

DNA: Differentiable Network-Accelerator Co-Search

Yongan Zhang, Yonggan Fu, Weiwen Jiang, Chaojian Li, Haoran You, Meng Li, Vikas Chandra, Yingyan Celine Lin

TL;DR

DNA tackles the joint optimization of DNN architectures and accelerator micro-architectures under resource constraints. It introduces a differentiable network search (DNS) and a differentiable accelerator search (DAS) operating on a Generic DNN Accelerator Design Space (GADS) to optimize the objective $L_{val}(\\omega^*, NET(\\alpha)) + \\lambda L_{hw}(NET(\\alpha), HW(\\gamma^*))$. The DNS uses a differentiable NAS formulation with $A_l = \\sum_{k=1}^{K} \\alpha_{lk} O_{lk}(A_{l-1})$, while the DAS applies Gumbel-Softmax sampling over accelerator parameters to estimate hardware-cost, enabling gradient-based joint optimization. Experiments on FPGA and ASIC show DNA-generated networks and accelerators outperform SOTA baselines (e.g., $3.04\\times$ FPS with $+5.46\\%$ accuracy on ImageNet) and reduce search time by up to $\\sim10^3\\times$, demonstrating the practicality of automated, joint co-design for DNN accelerators.

Abstract

Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in providing optimal performance. However, the great potential of such solutions have yet to be unleashed due to the challenge of simultaneously exploring the vast and entangled, yet different design spaces of the networks and their accelerators. To this end, we propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators to maximize both the task accuracy and acceleration efficiency. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators that is applicable to both FPGA- and ASIC-based DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration for more efficient DNNs and their accelerators; and (2) a joint DNN network and accelerator co-search algorithm that enables simultaneously searching for optimal DNN structures and their accelerators' micro-architectures and mapping methods to maximize both the task accuracy and acceleration efficiency. Experiments and ablation studies based on FPGA measurements and ASIC synthesis show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and DNN accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet), while requiring notably reduced search time (up to 1234.3x) over SOTA co-exploration methods, when evaluated over ten SOTA baselines on three datasets. All codes will be released upon acceptance.

DNA: Differentiable Network-Accelerator Co-Search

TL;DR

DNA tackles the joint optimization of DNN architectures and accelerator micro-architectures under resource constraints. It introduces a differentiable network search (DNS) and a differentiable accelerator search (DAS) operating on a Generic DNN Accelerator Design Space (GADS) to optimize the objective . The DNS uses a differentiable NAS formulation with , while the DAS applies Gumbel-Softmax sampling over accelerator parameters to estimate hardware-cost, enabling gradient-based joint optimization. Experiments on FPGA and ASIC show DNA-generated networks and accelerators outperform SOTA baselines (e.g., FPS with accuracy on ImageNet) and reduce search time by up to , demonstrating the practicality of automated, joint co-design for DNN accelerators.

Abstract

Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in providing optimal performance. However, the great potential of such solutions have yet to be unleashed due to the challenge of simultaneously exploring the vast and entangled, yet different design spaces of the networks and their accelerators. To this end, we propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators to maximize both the task accuracy and acceleration efficiency. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators that is applicable to both FPGA- and ASIC-based DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration for more efficient DNNs and their accelerators; and (2) a joint DNN network and accelerator co-search algorithm that enables simultaneously searching for optimal DNN structures and their accelerators' micro-architectures and mapping methods to maximize both the task accuracy and acceleration efficiency. Experiments and ablation studies based on FPGA measurements and ASIC synthesis show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and DNN accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet), while requiring notably reduced search time (up to 1234.3x) over SOTA co-exploration methods, when evaluated over ten SOTA baselines on three datasets. All codes will be released upon acceptance.

Paper Structure

This paper contains 19 sections, 5 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: An illustration of our DNA co-search framework, which accepts target tasks and accelerator specifications and then automatically generates matched DNNs and their accelerators to maximize both task accuracy and hardware efficiency.
  • Figure 2: FPGA measured Frame-Per-Second (FPS; see the left axis) on a ZC706 FPGA zc706 and CIFAR-100 based accuracy (see the right colorbar) of 300 randomly sampled networks from the FBNet wu2019fbnet search space, when each of the networks is accelerated by 300 randomly sampled accelerators from a generic accelerator design space, leading to a total of $9$E+4 randomly sampled data points in this figure. Designs with $ACC>73.5\%$ and $FPS >45$ are marked as stars, which are extremely sparse in the search space.
  • Figure 3: DNA's differentiable co-search of the network and accelerator joint space, where the DAS engine (right) optimizes the accelerator parameters based on Eq. \ref{['eq:update_hw']} and Eq. \ref{['eqn:obj_hw']} based on the input networks $NET(\alpha)$ and returns the corresponding hardware cost to the DNS engine (left) for which to penalize the costly operators. Here $\gamma^{rf}_{order}$ denotes the accelerator parameter of loop-order in the register file (RF), and similar notations are adopted for other accelerator parameters in Tab. \ref{['tab:hw_space']}.
  • Figure 4: DNA generated FPGA-based accelerators over those of SOTA co-exploration works HS-CO-Opt jiang2019hardware and BSW abdelfattah2020best, where we adopt the same DSP limits as the baselines, i.e., 450/512/450 on CIFAR-10/100/ImageNet, respectively.
  • Figure 5: Accuracy vs. EDP of DNA generated ASIC-based accelerators over three SOTA co-exploration designs on CIFAR-10 in NASAIC yang2020co.
  • ...and 4 more figures