Table of Contents
Fetching ...

Evolutionary Mapping of Neural Networks to Spatial Accelerators

Alessandro Pierro, Jonathan Timcheck, Jason Yik, Marius Lindauer, Eyke Hüllermeier, Marcel Wever

TL;DR

The paper tackles the problem of efficiently mapping neural networks onto spatial accelerators, where performance hinges on partitioning across cores and the spatial placement of those partitions. It introduces a bilevel, hardware-in-the-loop evolutionary framework that jointly optimizes partitioning and placement for Intel Loihi 2, without requiring hardware expertise. Empirical results show substantial latency and energy-efficiency gains over heuristics in both single-chip and multi-chip configurations, underscoring the importance of spatial locality and topology-aware placement. The work lays a foundation for scalable, automated deployment of workloads on spatial accelerators and points to future enhancements such as scalability through hierarchical mapping, surrogate models, and transfer learning for mappings across hardware revisions.

Abstract

Spatial accelerators, composed of arrays of compute-memory integrated units, offer an attractive platform for deploying inference workloads with low latency and low energy consumption. However, fully exploiting their architectural advantages typically requires careful, expert-driven mapping of computational graphs to distributed processing elements. In this work, we automate this process by framing the mapping challenge as a black-box optimization problem. We introduce the first evolutionary, hardware-in-the-loop mapping framework for neuromorphic accelerators, enabling users without deep hardware knowledge to deploy workloads more efficiently. We evaluate our approach on Intel Loihi 2, a representative spatial accelerator featuring 152 cores per chip in a 2D mesh. Our method achieves up to 35% reduction in total latency compared to default heuristics on two sparse multi-layer perceptron networks. Furthermore, we demonstrate the scalability of our approach to multi-chip systems and observe an up to 40% improvement in energy efficiency, without explicitly optimizing for it.

Evolutionary Mapping of Neural Networks to Spatial Accelerators

TL;DR

The paper tackles the problem of efficiently mapping neural networks onto spatial accelerators, where performance hinges on partitioning across cores and the spatial placement of those partitions. It introduces a bilevel, hardware-in-the-loop evolutionary framework that jointly optimizes partitioning and placement for Intel Loihi 2, without requiring hardware expertise. Empirical results show substantial latency and energy-efficiency gains over heuristics in both single-chip and multi-chip configurations, underscoring the importance of spatial locality and topology-aware placement. The work lays a foundation for scalable, automated deployment of workloads on spatial accelerators and points to future enhancements such as scalability through hierarchical mapping, surrogate models, and transfer learning for mappings across hardware revisions.

Abstract

Spatial accelerators, composed of arrays of compute-memory integrated units, offer an attractive platform for deploying inference workloads with low latency and low energy consumption. However, fully exploiting their architectural advantages typically requires careful, expert-driven mapping of computational graphs to distributed processing elements. In this work, we automate this process by framing the mapping challenge as a black-box optimization problem. We introduce the first evolutionary, hardware-in-the-loop mapping framework for neuromorphic accelerators, enabling users without deep hardware knowledge to deploy workloads more efficiently. We evaluate our approach on Intel Loihi 2, a representative spatial accelerator featuring 152 cores per chip in a 2D mesh. Our method achieves up to 35% reduction in total latency compared to default heuristics on two sparse multi-layer perceptron networks. Furthermore, we demonstrate the scalability of our approach to multi-chip systems and observe an up to 40% improvement in energy efficiency, without explicitly optimizing for it.
Paper Structure (28 sections, 5 equations, 8 figures, 1 table)

This paper contains 28 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Partitioning and placement of a neural network to a 2D mesh of cores, interconnected by a network-on-chip. Layer 1 (orange) is partitioned to four different cores (squares), connected via routers (circles). The activations from each layer are forwarded through the routers and links.
  • Figure 2: Illustration of mutation and reordering operators for the partitioning and placement populations respectively.
  • Figure 3: Distribution of average latency for a fixed partitioning across 50 random placements. Results are on a 6-layer MLP workload, averaged over 200 time steps.
  • Figure 4: Diagram of placement heuristics on SparseMLP-1.
  • Figure 5: Progression of the best latency found by the nested evolution algorithm over the number of fitness evaluations for (a) SparseMLP-1 workload on a single chip, and (b) SparseMLP-1 workload on two chips. The results are averaged over 5 random trials for each workload, with the shaded area covering the min/max interval.
  • ...and 3 more figures