Table of Contents
Fetching ...

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu

TL;DR

This work addresses the long runtimes of neuromorphic architecture studies by presenting GPU-RANC, a CUDA-based acceleration of the RANC simulator. The authors implement grid-level, synapse-level, and memory-layout optimizations to achieve substantial end-to-end speedups (up to 780x) across diverse workloads, including MNIST, VMM, CIFAR-10, and TrueNorth-like configurations. The approach enables rapid exploration of architectural parameters (e.g., crossbars, weights, bitwidths) and accelerates convergence toward optimized neuromorphic designs. The results demonstrate both dramatic computational gains and practical feasibility for large-scale neuromorphic experimentation, with future work aiming at streaming execution and multi-GPU expansion.

Abstract

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

TL;DR

This work addresses the long runtimes of neuromorphic architecture studies by presenting GPU-RANC, a CUDA-based acceleration of the RANC simulator. The authors implement grid-level, synapse-level, and memory-layout optimizations to achieve substantial end-to-end speedups (up to 780x) across diverse workloads, including MNIST, VMM, CIFAR-10, and TrueNorth-like configurations. The approach enables rapid exploration of architectural parameters (e.g., crossbars, weights, bitwidths) and accelerates convergence toward optimized neuromorphic designs. The results demonstrate both dramatic computational gains and practical feasibility for large-scale neuromorphic experimentation, with future work aiming at streaming execution and multi-GPU expansion.

Abstract

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.
Paper Structure (13 sections, 7 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the RANC architecture.
  • Figure 2: Router mapping illustration where packets move across each core in the reference serial implementation and packets are routed to destination directly to the destination to match the concurrency level of the neuron block.
  • Figure 3: Scheduler mapping where max tick offset is two and there are eight axons ($a_0$ through $a_7$) per core.
  • Figure 4: Speedup by different levels of Neuron Block optimizations across core counts over serial execution using MNIST.
  • Figure 5: RANC Router Kernel Speedup vs CUDA Block Size. Speedup normalized against serial router time.
  • ...and 2 more figures