SigDLA: A Deep Learning Accelerator Extension for Signal Processing

Fangfa Fu; Wenyu Zhang; Zesong Jiang; Zhiyu Zhu; Guoyu Li; Bing Yang; Cheng Liu; Liyi Xiao; Jinxiang Wang; Huawei Li; Xiaowei Li

SigDLA: A Deep Learning Accelerator Extension for Signal Processing

Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

TL;DR

The paper addresses the challenge of running signal processing alongside deep learning on IoT processors, where native hardware support for DSP tasks is lacking. It introduces SigDLA, a unified accelerator that sits on top of a conventional DLA and adds a programmable data shuffling fabric between on-chip buffers and the compute array, plus a variable-bitwidth compute path. Key contributions include the data shuffling fabric that converts irregular signal-processing patterns (such as FFT butterflies) into regular tensor operations, a serial 4-bit based compute array capable of 8/16-bit operations, and instructions that drive shuffling and padding; together, they map FFT, FIR, DCT, and DWT to convolution-like workloads. Evaluation shows average speedups of $4.4\times$, $1.4\times$, and $1.52\times$ and energy reductions of $4.82\times$, $3.27\times$, and $2.15\times$, with only $17\%$ more chip area, demonstrating effective, on-chip co-processing for IoT workloads.

Abstract

Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learning on DSPs is limited due to the lack of native hardware support. In this case, we present a contrary strategy and propose to enable signal processing on top of a classical deep learning accelerator (DLA). With the observation that irregular data patterns such as butterfly operations in FFT are the major barrier that hinders the deployment of signal processing on DLAs, we propose a programmable data shuffling fabric and have it inserted between the input buffer and computing array of DLAs such that the irregular data is reorganized and the processing is converted to be regular. With the online data shuffling, the proposed architecture, SigDLA, can adapt to various signal processing tasks without affecting the deep learning processing. Moreover, we build a reconfigurable computing array to suit the various data width requirements of both signal processing and deep learning. According to our experiments, SigDLA achieves an average performance speedup of 4.4$\times$, 1.4$\times$, and 1.52$\times$, and average energy reduction of 4.82$\times$, 3.27$\times$, and 2.15$\times$ compared to an embedded ARM processor with customized DSP instructions, a DSP processor, and an independent DSP-DLA architecture respectively with 17% more chip area over the original DLAs.

SigDLA: A Deep Learning Accelerator Extension for Signal Processing

TL;DR

, and

and energy reductions of

, and

, with only

more chip area, demonstrating effective, on-chip co-processing for IoT workloads.

Abstract

, 1.4

, and 1.52

, and average energy reduction of 4.82

, 3.27

, and 2.15

compared to an embedded ARM processor with customized DSP instructions, a DSP processor, and an independent DSP-DLA architecture respectively with 17% more chip area over the original DLAs.

Paper Structure (23 sections, 10 figures, 2 tables)

This paper contains 23 sections, 10 figures, 2 tables.

Introduction
Related Work & Motivation
Related Work
Motivation
SigDLA Architecture
Variable Bitwidth Computing Array
Mapping Variable Bitwidth Operations
Micro-Architecture
Programmable Data Shuffling
Mapping Signal Processing Operations to Convolution
Micro-Architecture
Buffer Controller Interface
Data Shuffling Unit
Data Padding Unit
Shuffling Instructions
...and 8 more sections

Figures (10)

Figure 1: SigDLA Architecture Overview.
Figure 2: Implementation of Variable Bitwidth Computing Array.
Figure 3: Mapping different signal processing algorithms to convolution.
Figure 4: The Micro-Architecture of Shuffling Fabric
Figure 5: Shuffling Instruction.
...and 5 more figures

SigDLA: A Deep Learning Accelerator Extension for Signal Processing

TL;DR

Abstract

SigDLA: A Deep Learning Accelerator Extension for Signal Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (10)