Scalable Construction of Spiking Neural Networks using up to thousands of GPUs
Bruno Golosio, Gianmarco Tiddia, José Villamar, Luca Pontisso, Luca Sergi, Francesco Simula, Pooja Babu, Elena Pastorelli, Abigail Morrison, Markus Diesmann, Alessandro Lonardo, Pier Stanislao Paolucci, Johanna Senk
TL;DR
The study tackles the challenge of simulating large-scale spiking neural networks on GPU clusters by redesigning network construction and spike communication for extreme parallelism. It introduces an onboard, memory-efficient GPU-based network-construction workflow that builds local connectivity and prepares MPI communication structures entirely on each GPU, using proxy image neurons to route remote spikes. The work analyzes two MPI-spike-delivery schemes (point-to-point and collective) and four optimization levels to trade GPU memory usage against time-to-solution, demonstrating strong and weak scaling on the Multi-Area Model and scalable balanced networks across thousands of GPUs. The results show substantial speedups in network construction, viable memory footprints for exascale-like machines, and practical guidance for structure-aware mapping and hybrid communication strategies, with code released as NEST GPU 2.0.
Abstract
Diverse scientific and engineering research areas deal with discrete, time-stamped changes in large systems of interacting delay differential equations. Simulating such complex systems at scale on high-performance computing clusters demands efficient management of communication and memory. Inspired by the human cerebral cortex -- a sparsely connected network of $\mathcal{O}(10^{10})$ neurons, each forming $\mathcal{O}(10^{3})$--$\mathcal{O}(10^{4})$ synapses and communicating via short electrical pulses called spikes -- we study the simulation of large-scale spiking neural networks for computational neuroscience research. This work presents a novel network construction method for multi-GPU clusters and upcoming exascale supercomputers using the Message Passing Interface (MPI), where each process builds its local connectivity and prepares the data structures for efficient spike exchange across the cluster during state propagation. We demonstrate scaling performance of two cortical models using point-to-point and collective communication, respectively.
