Table of Contents
Fetching ...

From Circuits to SoC Processors: Arithmetic Approximation Techniques & Embedded Computing Methodologies for DSP Acceleration

Vasileios Leon

TL;DR

This dissertation tackles the energy/throughput bottleneck in DSP and AI workloads by marrying Approximate Computing with hardware acceleration and heterogeneity. It develops a broad design space of arithmetic approximations (notably DLSB, hybrid high-radix RAD, and cooperative ROUP/RADR encodings) and runtime-configurable architectures (AxFXU/AxFPU, DyFXU/DyFPU) to trade accuracy for substantial energy/area gains while maintaining acceptable QoS. On platforms, it demonstrates mappings and accelerators across ASIC/FPGA (including space-grade NanoXplore FPGAs) and heterogeneous VPUs (Intel Myriad), delivering up to around 63% energy savings and up to 20x gains in DSP kernels, 8.5x–12x improvements for CV pipelines, and meaningful DNN acceleration without retraining. The work also provides systematic methodologies for porting complex computer-vision kernels to space-grade FPGAs and for exploiting VPU heterogeneity, establishing practical workflows for edge AI in demanding environments. Overall, the contributions offer a comprehensive framework for energy-efficient DSP/AI hardware accelerators across low-level circuit techniques and high-level platform mappings with strong real-world relevance for space and edge computing.

Abstract

The computing industry is forced to find alternative design approaches and computing platforms to sustain increased power efficiency, while providing sufficient performance. Among the examined solutions, Approximate Computing, Hardware Acceleration, and Heterogeneous Computing have gained great momentum. In this Dissertation, we introduce design solutions and methodologies, built on top of the preceding computing paradigms, for the development of energy-efficient DSP and AI accelerators. In particular, we adopt the promising paradigm of Approximate Computing and apply new approximation techniques in the design of arithmetic circuits. The proposed arithmetic approximation techniques involve bit-level optimizations, inexact operand encodings, and skipping of computations, while they are applied in both fixed- and floating-point arithmetic. We also conduct an extensive exploration on combinations among the approximation techniques and propose a low-overhead scheme for seamlessly adjusting the approximation degree of our circuits at runtime. Based on our methodology, these arithmetic approximation techniques are then combined with hardware design techniques to implement approximate ASIC- and FPGA-based DSP and AI accelerators. Moreover, we propose methodologies for the efficient mapping of DSP/AI kernels on distinctive embedded devices, i.e., the space-grade FPGAs and the heterogeneous VPUs. On the one hand, we cope with the decreased flexibility of space-grade technology and the technical challenges that arise in new FPGA tools. On the other hand, we unlock the full potential of heterogeneity by exploiting all the diverse processors and memories. Based on our methodology, we efficiently map computer vision algorithms onto the radiation-hardened NanoXplore's FPGAs and accelerate DSP & CNN kernels on Intel's Myriad VPUs.

From Circuits to SoC Processors: Arithmetic Approximation Techniques & Embedded Computing Methodologies for DSP Acceleration

TL;DR

This dissertation tackles the energy/throughput bottleneck in DSP and AI workloads by marrying Approximate Computing with hardware acceleration and heterogeneity. It develops a broad design space of arithmetic approximations (notably DLSB, hybrid high-radix RAD, and cooperative ROUP/RADR encodings) and runtime-configurable architectures (AxFXU/AxFPU, DyFXU/DyFPU) to trade accuracy for substantial energy/area gains while maintaining acceptable QoS. On platforms, it demonstrates mappings and accelerators across ASIC/FPGA (including space-grade NanoXplore FPGAs) and heterogeneous VPUs (Intel Myriad), delivering up to around 63% energy savings and up to 20x gains in DSP kernels, 8.5x–12x improvements for CV pipelines, and meaningful DNN acceleration without retraining. The work also provides systematic methodologies for porting complex computer-vision kernels to space-grade FPGAs and for exploiting VPU heterogeneity, establishing practical workflows for edge AI in demanding environments. Overall, the contributions offer a comprehensive framework for energy-efficient DSP/AI hardware accelerators across low-level circuit techniques and high-level platform mappings with strong real-world relevance for space and edge computing.

Abstract

The computing industry is forced to find alternative design approaches and computing platforms to sustain increased power efficiency, while providing sufficient performance. Among the examined solutions, Approximate Computing, Hardware Acceleration, and Heterogeneous Computing have gained great momentum. In this Dissertation, we introduce design solutions and methodologies, built on top of the preceding computing paradigms, for the development of energy-efficient DSP and AI accelerators. In particular, we adopt the promising paradigm of Approximate Computing and apply new approximation techniques in the design of arithmetic circuits. The proposed arithmetic approximation techniques involve bit-level optimizations, inexact operand encodings, and skipping of computations, while they are applied in both fixed- and floating-point arithmetic. We also conduct an extensive exploration on combinations among the approximation techniques and propose a low-overhead scheme for seamlessly adjusting the approximation degree of our circuits at runtime. Based on our methodology, these arithmetic approximation techniques are then combined with hardware design techniques to implement approximate ASIC- and FPGA-based DSP and AI accelerators. Moreover, we propose methodologies for the efficient mapping of DSP/AI kernels on distinctive embedded devices, i.e., the space-grade FPGAs and the heterogeneous VPUs. On the one hand, we cope with the decreased flexibility of space-grade technology and the technical challenges that arise in new FPGA tools. On the other hand, we unlock the full potential of heterogeneity by exploiting all the diverse processors and memories. Based on our methodology, we efficiently map computer vision algorithms onto the radiation-hardened NanoXplore's FPGAs and accelerate DSP & CNN kernels on Intel's Myriad VPUs.
Paper Structure (168 sections, 49 equations, 70 figures, 43 tables)

This paper contains 168 sections, 49 equations, 70 figures, 43 tables.

Figures (70)

  • Figure 1: Number of connected IoT devices from 2015 to 2025. Source: IoT Analytics, https://iot-analytics.com/number-connected-iot-devices/.
  • Figure 2: 50-year trends in microprocessors. Source: Karl Rupp, https://github.com/karlrupp/microprocessor-trend-data.
  • Figure 3: High-level architecture of (a) the Eyeriss ASIC eyeriss and (b) the Xilinx Zynq-7000 SoC FPGA zynqs.
  • Figure 4: Evolution of heterogeneous computing architectures moore_future: (a) homogeneous CPU (past), (b) CPU + GPU/DSP (present), (c) CPU + GPU/DSP + accelerators (present), and (d) extreme heterogeneity in processors and memories (future).
  • Figure 5: The structure of the Ph.D. Dissertation.
  • ...and 65 more figures