NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

Yang Xu; Huihong Shi; Zhongfeng Wang

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

Yang Xu, Huihong Shi, Zhongfeng Wang

TL;DR

This work proposes NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models, and proposes a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts.

Abstract

The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multiplication-reduced hybrid models have emerged to combine the benefits of both approaches. Particularly, prior works, i.e., NASA and NASA-F, leverage Neural Architecture Search (NAS) to construct such hybrid models, enhancing hardware efficiency while maintaining accuracy. However, they either entail costly retraining or encounter gradient conflicts, limiting both search efficiency and accuracy. Additionally, they overlook the acceleration opportunity introduced by accelerator search, yielding sub-optimal hardware performance. To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models. Specifically, as for NAS, we propose a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts. Regarding accelerator search, we innovatively introduce coarse-to-fine search to streamline the search process. Furthermore, we seamlessly integrate these two levels of searches to unveil NASH, obtaining the optimal model and accelerator pairing. Experiments validate our effectiveness, e.g., when compared with the state-of-the-art multiplication-based system, we can achieve $\uparrow$$2.14\times$ throughput and $\uparrow$$2.01\times$ FPS with $\uparrow$$0.25\%$ accuracy on CIFAR-100, and $\uparrow$$1.40\times$ throughput and $\uparrow$$1.19\times$ FPS with $\uparrow$$0.56\%$ accuracy on Tiny-ImageNet. Codes are available at \url{https://github.com/xuyang527/NASH.}

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

TL;DR

Abstract

throughput and

FPS with

accuracy on CIFAR-100, and

throughput and

FPS with

accuracy on Tiny-ImageNet. Codes are available at \url{https://github.com/xuyang527/NASH.}

Paper Structure (33 sections, 6 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 6 equations, 8 figures, 11 tables, 1 algorithm.

Introduction
Related Works
Multiplication-Reduced DNNs
Neural Architecture Search (NAS)
Accelerator Search
The Neural Architecture Search
The Hybrid Search Space
Zero-Shot Search
The Tailored Zero-shot Metric
Trainability
Expressivity
Overall
Neural Architecture Search
Preference-Biased Supernet Training
The Accelerator Search
...and 18 more sections

Figures (8)

Figure 1: Pipelines of (a) one-shot supernet-based NAS BigNAS, AlphaNet and (b) zero-shot NAS framework combined with preference-biased supernet trainingwang2023prenas.
Figure 2: The overview of our NASH framework, where we integrate both the neural architecture search (NAS) and coarse-to-fine accelerator search to directly obtain optimal pairing of models and accelerators. Specifically, the NAS consists of a tailored zero-shot metric to pre-identify promising multiplication-reduce hybrid models before supernet training. Besides, the accelerator search involves a novel coarse-to-fine search strategy to expedite the accelerator search process.
Figure 3: Correlations between model accuracy and zero-shot metrics, including (a) SNIP, (b) NN-Degree, (c) Zen-Score, and (d) our tailored zero-shot metric, when measured on multiplication-reduced hybrid models.
Figure 4: (a) depicts the accelerator micro-architecture, which leverages distinct hardware resources on FPGAs to develop tailored chunks, dubbed Chunk-C, Chunk-S, and Chunk-A, to support convolutions, shift, and adder layers within searched hybrid models, respectively. (b) shows mapping methods (i.e., dataflows) for chunks using the widely adopted for-loop description eyeriss. Notably, components labeled in orange denote the searchable elements within our accelerator search space.
Figure 5: Illustrating the processing timeline of our chunk-based accelerator, where we use a hybrid model consisting of four layers (Conv1, Shift2, Conv3, and Adder4) as an example. Due to the limited DSP resources available on FPGAs, DSP-based Chunk-C emerges as the most latency-dominated chunk in our accelerator.
...and 3 more figures

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

TL;DR

Abstract

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)