Table of Contents
Fetching ...

2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency

Yonggan Fu, Yang Zhao, Qixuan Yu, Chaojian Li, Yingyan Celine Lin

TL;DR

This work addresses the need for DNN accelerators that are simultaneously robust to adversarial attacks and hardware-efficient for IoT devices. It introduces the Random Precision Switch (RPS) algorithm to defend models via in-situ quantization while enabling runtime robustness-efficiency trade-offs, and pairs it with a precision-scalable MAC architecture that combines spatial and temporal design principles. An automated optimizer searches dataflows and micro-architectures to maximize efficiency under various precisions, validated by extensive experiments showing substantial gains in robust accuracy, energy efficiency, and throughput relative to existing baselines. The results demonstrate a practical, scalable path to deploying secure and efficient DNNs on resource-constrained devices without retraining, with strong implications for real-world IoT applications.

Abstract

The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators. However, the deployment of DNN accelerator enabled intelligent functionality into real-world IoT devices still remains particularly challenging. First, powerful DNNs often come at prohibitive complexities, whereas IoT devices often suffer from stringent resource constraints. Second, while DNNs are vulnerable to adversarial attacks especially on IoT devices exposed to complex real-world environments, many IoT applications require strict security. Existing DNN accelerators mostly tackle only one of the two aforementioned challenges (i.e., efficiency or adversarial robustness) while neglecting or even sacrificing the other. To this end, we propose a 2-in-1 Accelerator, an integrated algorithm-accelerator co-design framework aiming at winning both the adversarial robustness and efficiency of DNN accelerators. Specifically, we first propose a Random Precision Switch (RPS) algorithm that can effectively defend DNNs against adversarial attacks by enabling random DNN quantization as an in-situ model switch. Furthermore, we propose a new precision-scalable accelerator featuring (1) a new precision-scalable MAC unit architecture which spatially tiles the temporal MAC units to boost both the achievable efficiency and flexibility and (2) a systematically optimized dataflow that is searched by our generic accelerator optimizer. Extensive experiments and ablation studies validate that our 2-in-1 Accelerator can not only aggressively boost both the adversarial robustness and efficiency of DNN accelerators under various attacks, but also naturally support instantaneous robustness-efficiency trade-offs adapting to varied resources without the necessity of DNN retraining.

2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency

TL;DR

This work addresses the need for DNN accelerators that are simultaneously robust to adversarial attacks and hardware-efficient for IoT devices. It introduces the Random Precision Switch (RPS) algorithm to defend models via in-situ quantization while enabling runtime robustness-efficiency trade-offs, and pairs it with a precision-scalable MAC architecture that combines spatial and temporal design principles. An automated optimizer searches dataflows and micro-architectures to maximize efficiency under various precisions, validated by extensive experiments showing substantial gains in robust accuracy, energy efficiency, and throughput relative to existing baselines. The results demonstrate a practical, scalable path to deploying secure and efficient DNNs on resource-constrained devices without retraining, with strong implications for real-world IoT applications.

Abstract

The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators. However, the deployment of DNN accelerator enabled intelligent functionality into real-world IoT devices still remains particularly challenging. First, powerful DNNs often come at prohibitive complexities, whereas IoT devices often suffer from stringent resource constraints. Second, while DNNs are vulnerable to adversarial attacks especially on IoT devices exposed to complex real-world environments, many IoT applications require strict security. Existing DNN accelerators mostly tackle only one of the two aforementioned challenges (i.e., efficiency or adversarial robustness) while neglecting or even sacrificing the other. To this end, we propose a 2-in-1 Accelerator, an integrated algorithm-accelerator co-design framework aiming at winning both the adversarial robustness and efficiency of DNN accelerators. Specifically, we first propose a Random Precision Switch (RPS) algorithm that can effectively defend DNNs against adversarial attacks by enabling random DNN quantization as an in-situ model switch. Furthermore, we propose a new precision-scalable accelerator featuring (1) a new precision-scalable MAC unit architecture which spatially tiles the temporal MAC units to boost both the achievable efficiency and flexibility and (2) a systematically optimized dataflow that is searched by our generic accelerator optimizer. Extensive experiments and ablation studies validate that our 2-in-1 Accelerator can not only aggressively boost both the adversarial robustness and efficiency of DNN accelerators under various attacks, but also naturally support instantaneous robustness-efficiency trade-offs adapting to varied resources without the necessity of DNN retraining.

Paper Structure

This paper contains 32 sections, 3 equations, 11 figures, 6 tables, 2 algorithms.

Figures (11)

  • Figure 1: Visualizing the transferability of adversarial attacks between different precisions, where the robust accuracy under different training methods (PGD-7 and FGSM-RS) and attacks (PGD-20 and CW-Inf) is annotated.
  • Figure 2: Throughput under different precisions of Bit Fusion and Stripes for accelerating ResNet-50 on ImageNet.
  • Figure 3: Area breakdown of the MAC units based on SOTA temporal/spatial designs and our proposed design.
  • Figure 4: The MAC unit of the temporal design, spatial design, and our spatial-temporal design which spatially tiles the temporal units to marry the advantages of both temporal and spatial designs for variable precision execution. For 8-bit weight and input in this case, it takes 8 cycles, 1 cycles, and 4 cycles for the temporal, spatial, and our design.
  • Figure 5: Reorganizing the bit-level split and allocation reduces the number of shifters by 1/$n$ ($n$=4 in this case, denoting the number of partial sums) when handling the inputs and weights of 2m-bit. Here $a_{i}^{L}$/$b_{i}^{L}$ is the first m-bit LSB of inputs/weights and $a_{i}^{H}$/$b_{i}^{H}$ is the remaining MSBs of the $i$-th partial sum.
  • ...and 6 more figures