Table of Contents
Fetching ...

PIMSYN: Synthesizing Processing-in-memory CNN Accelerators

Wanqian Li, Xiaotian Sun, Xinyu Wang, Lei Wang, Yinhe Han, Xiaoming Chen

TL;DR

PIMSYN tackles the challenge of manually designing Processing-in-Memory CNN accelerators by offering an automatic, full‑stack synthesis framework. It employs a four‑stage pipeline—weight duplication, dataflow compilation, macro partitioning, and components allocation—with an integrated design‑space exploration using simulated annealing and evolutionary algorithms. The framework yields dataflow schedules and hardware mappings that significantly improve power efficiency and throughput compared with manual designs and prior co‑exploration work. By being largely device‑agnostic and capable of exploring enormous design spaces, PIMSYN demonstrates practical impact for energy‑efficient PIM based CNN acceleration.

Abstract

Processing-in-memory architectures have been regarded as a promising solution for CNN acceleration. Existing PIM accelerator designs rely heavily on the experience of experts and require significant manual design overhead. Manual design cannot effectively optimize and explore architecture implementations. In this work, we develop an automatic framework PIMSYN for synthesizing PIM-based CNN accelerators, which greatly facilitates architecture design and helps generate energyefficient accelerators. PIMSYN can automatically transform CNN applications into execution workflows and hardware construction of PIM accelerators. To systematically optimize the architecture, we embed an architectural exploration flow into the synthesis framework, providing a more comprehensive design space. Experiments demonstrate that PIMSYN improves the power efficiency by several times compared with existing works. PIMSYN can be obtained from https://github.com/lixixi-jook/PIMSYN-NN.

PIMSYN: Synthesizing Processing-in-memory CNN Accelerators

TL;DR

PIMSYN tackles the challenge of manually designing Processing-in-Memory CNN accelerators by offering an automatic, full‑stack synthesis framework. It employs a four‑stage pipeline—weight duplication, dataflow compilation, macro partitioning, and components allocation—with an integrated design‑space exploration using simulated annealing and evolutionary algorithms. The framework yields dataflow schedules and hardware mappings that significantly improve power efficiency and throughput compared with manual designs and prior co‑exploration work. By being largely device‑agnostic and capable of exploring enormous design spaces, PIMSYN demonstrates practical impact for energy‑efficient PIM based CNN acceleration.

Abstract

Processing-in-memory architectures have been regarded as a promising solution for CNN acceleration. Existing PIM accelerator designs rely heavily on the experience of experts and require significant manual design overhead. Manual design cannot effectively optimize and explore architecture implementations. In this work, we develop an automatic framework PIMSYN for synthesizing PIM-based CNN accelerators, which greatly facilitates architecture design and helps generate energyefficient accelerators. PIMSYN can automatically transform CNN applications into execution workflows and hardware construction of PIM accelerators. To systematically optimize the architecture, we embed an architectural exploration flow into the synthesis framework, providing a more comprehensive design space. Experiments demonstrate that PIMSYN improves the power efficiency by several times compared with existing works. PIMSYN can be obtained from https://github.com/lixixi-jook/PIMSYN-NN.
Paper Structure (22 sections, 6 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 22 sections, 6 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Crossbar-accelerated convolution computation and weight duplication.
  • Figure 2: Architecture abstraction of PIM-based CNN accelerators. (a) Overall architecture. (b) Macro. (c) PE.
  • Figure 3: Overview of PIMSYN framework.
  • Figure 4: Dependency relationship between IRs.
  • Figure 5: (a) Normalized delay caused by inter-layer ADC reuse. (b) Normalized number of reduced ADCs after reuse.
  • ...and 4 more figures