Table of Contents
Fetching ...

NeuroFlex: Column-Exact ANN-SNN Co-Execution Accelerator with Cost-Guided Scheduling

Varun Manjunath, Pranav Ramesh, Gopalakrishnan Srinivasan

TL;DR

...

Abstract

NeuroFlex is a column-level accelerator that co-executes artificial and spiking neural networks to minimize energy-delay product on sparse edge workloads with competitive accuracy. The design extends integer-exact QCFS ANN-SNN conversion from layers to independent columns. It unifies INT8 storage with on-the-fly spike generation using an offline cost model to assign columns to ANN or SNN cores and pack work across processing elements with deterministic runtime. Our cost-guided scheduling algorithm improves throughput by 16-19% over random mapping and lowers EDP by 57-67% versus a strong ANN-only baseline across VGG-16, ResNet-34, GoogLeNet, and BERT models. NeuroFlex also delivers up to 2.5x speedup over LoAS and 2.51x energy reduction over SparTen. These results indicate that fine-grained and integer-exact hybridization outperforms single-mode designs on energy and latency without sacrificing accuracy.

NeuroFlex: Column-Exact ANN-SNN Co-Execution Accelerator with Cost-Guided Scheduling

TL;DR

...

Abstract

NeuroFlex is a column-level accelerator that co-executes artificial and spiking neural networks to minimize energy-delay product on sparse edge workloads with competitive accuracy. The design extends integer-exact QCFS ANN-SNN conversion from layers to independent columns. It unifies INT8 storage with on-the-fly spike generation using an offline cost model to assign columns to ANN or SNN cores and pack work across processing elements with deterministic runtime. Our cost-guided scheduling algorithm improves throughput by 16-19% over random mapping and lowers EDP by 57-67% versus a strong ANN-only baseline across VGG-16, ResNet-34, GoogLeNet, and BERT models. NeuroFlex also delivers up to 2.5x speedup over LoAS and 2.51x energy reduction over SparTen. These results indicate that fine-grained and integer-exact hybridization outperforms single-mode designs on energy and latency without sacrificing accuracy.

Paper Structure

This paper contains 46 sections, 3 theorems, 5 equations, 11 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

For any column $c$ operating with integer threshold $\theta_n$ and quantization step $L_n$, the SNN realization $\mathcal{M}_c$ is mathematically equivalent to its ANN realization $\mathcal{N}_c$, i.e.

Figures (11)

  • Figure 1: Comparison of execution granularity across accelerators. (Left) SNN-only accelerator: all columns executed as SNNs are energy-efficient but slow. (Right) ANN-only accelerator: all columns executed as ANNs are fast but power hungry. (Center) NeuroFlex enables column-wise hybridization: each output column runs as ANN or SNN based on its cost, achieving high utilization and balancing energy and latency. Empty cells represent zeros.
  • Figure 2: Top-level organization of the proposed NeuroFlex accelerator showing unified memory, shared compressor, FiberCache, and the dual ANN/SNN compute cores.
  • Figure 3: Organization of the ANN core. Each PE performs fetch, prefix alignment, MAC accumulation, and QCFS activation before immediate write-back.
  • Figure 4: Organization of the SNN core. Each PE includes dual prefix-sum circuits, pseudo-accumulators, and correction accumulators to realize PASCAL-equivalent integrate-and-fire computation. Data A and B correspond to activation inputs and weights, respectively. Only nonzero activations in Data A that match active weights undergo spike generation.
  • Figure 5: Cost-space diagnostics used in offline scheduling. A: marginal efficiency of moving a single column to SNN (energy saved / delay penalty). B: energy–delay scatter with iso-EDP contours; the labeled point is the minimum-EDP assignment; $m$ values are the number of matches per column. $m$ used: [12, 16, 44, 52, 57, 71, 114, 125, 140, 216].
  • ...and 6 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 1.1
  • Corollary 1.2