Table of Contents
Fetching ...

Hardware architecture and routing-aware training for optimal memory usage: a case study

Jimmy Weber, Theo Ballet, Melika Payvand

TL;DR

The paper addresses memory bottlenecks in neuromorphic hardware by co-designing routing-aware training. It extends the DeepR framework with a hardware-aware sparsity constraint defined as $S(\theta)=\frac{|\theta|_0}{N^2}$ and introduces a proxy mapping function over hop-dependent sparsity $p_d(\theta)$ to approximate placement and routing. Evaluated on Mosaic with SHD, the routing-aware model achieves higher accuracy at the same parameter count and the same accuracy with markedly reduced memory, illustrating the value of algorithm–hardware co-design. This work demonstrates a practical pathway to scalable deployment of larger models in memory-constrained event-based hardware.

Abstract

Efficient deployment of neural networks on resource-constrained hardware demands optimal use of on-chip memory. In event-based processors, this is particularly critical for routing architectures, where substantial memory is dedicated to managing network connectivity. While prior work has focused on optimizing event routing during hardware design, optimizing memory utilization for routing during network training remains underexplored. Key challenges include: (i) integrating routing into the loss function, which often introduces non-differentiability, and (ii) computational expense in evaluating network mappability to hardware. We propose a hardware-algorithm co-design approach to train routing-aware neural networks. To address challenge (i), we extend the DeepR training algorithm, leveraging dynamic pruning and random re-assignment to optimize memory use. For challenge (ii), we introduce a proxy-based approximation of the mapping function to incorporate placement and routing constraints efficiently. We demonstrate our approach by optimizing a network for the Spiking Heidelberg Digits (SHD) dataset using a small-world connectivity-based hardware architecture as a case study. The resulting network, trained with our routing-aware methodology, is fully mappable to the hardware, achieving 5% more accuracy using the same number of parameters, and iso-accuracy with 10x less memory usage, compared to non-routing-aware training methods. This work highlights the critical role of co-optimizing algorithms and hardware to enable efficient and scalable solutions for constrained environments.

Hardware architecture and routing-aware training for optimal memory usage: a case study

TL;DR

The paper addresses memory bottlenecks in neuromorphic hardware by co-designing routing-aware training. It extends the DeepR framework with a hardware-aware sparsity constraint defined as and introduces a proxy mapping function over hop-dependent sparsity to approximate placement and routing. Evaluated on Mosaic with SHD, the routing-aware model achieves higher accuracy at the same parameter count and the same accuracy with markedly reduced memory, illustrating the value of algorithm–hardware co-design. This work demonstrates a practical pathway to scalable deployment of larger models in memory-constrained event-based hardware.

Abstract

Efficient deployment of neural networks on resource-constrained hardware demands optimal use of on-chip memory. In event-based processors, this is particularly critical for routing architectures, where substantial memory is dedicated to managing network connectivity. While prior work has focused on optimizing event routing during hardware design, optimizing memory utilization for routing during network training remains underexplored. Key challenges include: (i) integrating routing into the loss function, which often introduces non-differentiability, and (ii) computational expense in evaluating network mappability to hardware. We propose a hardware-algorithm co-design approach to train routing-aware neural networks. To address challenge (i), we extend the DeepR training algorithm, leveraging dynamic pruning and random re-assignment to optimize memory use. For challenge (ii), we introduce a proxy-based approximation of the mapping function to incorporate placement and routing constraints efficiently. We demonstrate our approach by optimizing a network for the Spiking Heidelberg Digits (SHD) dataset using a small-world connectivity-based hardware architecture as a case study. The resulting network, trained with our routing-aware methodology, is fully mappable to the hardware, achieving 5% more accuracy using the same number of parameters, and iso-accuracy with 10x less memory usage, compared to non-routing-aware training methods. This work highlights the critical role of co-optimizing algorithms and hardware to enable efficient and scalable solutions for constrained environments.

Paper Structure

This paper contains 6 sections, 2 equations, 3 figures, 2 algorithms.

Figures (3)

  • Figure 1: Mosaic hardware architecture as our case study, with a small-world connectivity layout. a) Details of the Mosaic architecture with distributed Neuron Tiles (NT) and Routing Tiles ($RT_0$ and $RT_1$). Each NT integrates incoming messages from its neighbors and send its output to the fabricate through its routing neighbors ($RT_0$). $RT_1$s interface with $RT_0$s to pass along the spikes to other RTs. b) The cost of communication between NTs (number of hops required to go from one to another) at different locations for a Mosaic architecture of size 3 $\times$ 3.
  • Figure 2: Routing information on the Mosaic architecture. a) We use the 1-turn algorithm for routing spikes, where the routing path from the source to destination only takes one turn (red arrow). We use $RT_1$s (blue squares) for turning, and keep $RT_0$s (green squares) as no-turn routers. b) We use a shared-path routing, such that if two destinations have an overlapping path, the path is shared for the longest possible distance, before taking a turn to the two destinations. c) The occupancy rate of each tile for an example mapped network onto the Mosaic architecture. The routing algorithm calculates the required resource per NT and RT, and checks whether the network is mappable on the hardware. d) The minimum number of required memory resources for the NT and RT as a function of the sparsity of connections for each hop, to ensure mappability of the network on the Mosaic hardware. RT and NT sizes refer to input size of a square crossbar array.
  • Figure 3: Test accuracy of the routing-aware training on the SHD dataset. a) Test accuracy on SHD for different Mosaic architectures identified by the sparsity of connections for hop 1 and hop 3 (($\hat{P}$)= ($p_1$, $p_3$)), other sparsity set to $0$. The mean and standard deviation of the accuracy is calculated across 30 training runs, for each ($p_1$, $p_3$) pair. b) Test accuracy on SHD as a function of the memory count in the Mosaic architecture, when training using the routing-aware method compared to the non-routing aware one. Both are compared against the vanilla Recurrent Spiking Neural Network (RSNN) results from cramer_etal2020_SHD.