Table of Contents
Fetching ...

I Like To Move It -- Computation Instead of Data in the Brain

Fabian Czappa, Marvin Kaster, Felix Wolf

TL;DR

The paper tackles the scalability challenge of structural plasticity–driven brain simulations by addressing two bottlenecks: connectivity updates and spike exchange. It introduces two methods: a location-aware Barnes–Hut that moves computation to the data’s location to achieve $O(1)$ per-neuron communication in the worst case, and a firing-rate approximation that reduces synchronization by exchanging firing frequencies rather than individual spikes, controlled by an epoch length $\Delta$. Theoretical analysis and large-scale experiments show the connectivity-update time drops by up to a factor of 6, spike-exchange time by more than two orders of magnitude, and overall wall-clock time by about 78.8%, with data-transfer costs greatly reduced. These advances enable larger MSP-based brain simulations and highlight the potential for GPU acceleration to push toward more extensive, near-term whole-brain modeling, while outlining future challenges in mapping repeated Barnes–Hut computations to GPUs. The baseline cost scales as $O(n^2)$ in naïve MSP implementations, which is reduced to $O(n \log n)$ by BH techniques, and further mitigated by the proposed communication optimizations.

Abstract

The detailed functioning of the human brain is still poorly understood. Brain simulations are a well-established way to complement experimental research, but must contend with the computational demands of the approximately $10^{11}$ neurons and the $10^{14}$ synapses connecting them, the network of the latter referred to as the connectome. Studies suggest that changes in the connectome (i.e., the formation and deletion of synapses, also known as structural plasticity) are essential for critical tasks such as memory formation and learning. The connectivity update can be efficiently computed using a Barnes-Hut-inspired approximation that lowers the computational complexity from $O(n^2)$ to $O(n log n)$, where n is the number of neurons. However, updating synapses, which relies heavily on RMA, and the spike exchange between neurons, which requires all-to-all communication at every time step, still hinder scalability. We present a new algorithm that significantly reduces the communication overhead by moving computation instead of data. This shrinks the time it takes to update connectivity by a factor of six and the time it takes to exchange spikes by more than two orders of magnitude.

I Like To Move It -- Computation Instead of Data in the Brain

TL;DR

The paper tackles the scalability challenge of structural plasticity–driven brain simulations by addressing two bottlenecks: connectivity updates and spike exchange. It introduces two methods: a location-aware Barnes–Hut that moves computation to the data’s location to achieve per-neuron communication in the worst case, and a firing-rate approximation that reduces synchronization by exchanging firing frequencies rather than individual spikes, controlled by an epoch length . Theoretical analysis and large-scale experiments show the connectivity-update time drops by up to a factor of 6, spike-exchange time by more than two orders of magnitude, and overall wall-clock time by about 78.8%, with data-transfer costs greatly reduced. These advances enable larger MSP-based brain simulations and highlight the potential for GPU acceleration to push toward more extensive, near-term whole-brain modeling, while outlining future challenges in mapping repeated Barnes–Hut computations to GPUs. The baseline cost scales as in naïve MSP implementations, which is reduced to by BH techniques, and further mitigated by the proposed communication optimizations.

Abstract

The detailed functioning of the human brain is still poorly understood. Brain simulations are a well-established way to complement experimental research, but must contend with the computational demands of the approximately neurons and the synapses connecting them, the network of the latter referred to as the connectome. Studies suggest that changes in the connectome (i.e., the formation and deletion of synapses, also known as structural plasticity) are essential for critical tasks such as memory formation and learning. The connectivity update can be efficiently computed using a Barnes-Hut-inspired approximation that lowers the computational complexity from to , where n is the number of neurons. However, updating synapses, which relies heavily on RMA, and the spike exchange between neurons, which requires all-to-all communication at every time step, still hinder scalability. We present a new algorithm that significantly reduces the communication overhead by moving computation instead of data. This shrinks the time it takes to update connectivity by a factor of six and the time it takes to exchange spikes by more than two orders of magnitude.

Paper Structure

This paper contains 25 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Example of a distributed tree (shown is a binary tree for the sake of simplicity) from the view of the first process. Boxes are inner nodes, and circles are leaf nodes; the numeric label indicates the process of owning the node. As we depict the view from process 1, it owns the (replica of the) root node. Solid lines indicate information on process 1, and dashed lines indicate information that resides on other ranks.
  • Figure 2: The neuron from process 1 (the leaf node with the blue solid marking) will propose a synapse to the neuron from process 2 (the leaf node with the red solid marking). In the default Barnes--Hut implementation, process 1 must download all nodes marked red via remote-memory access. In our proposed algorithm, process 1 sends only a part of the information from the neuron to process 2. Thus, the transfer direction has mostly been reversed, and the computation has been moved to the target node.
  • Figure 3: Timing plots for the old Barnes--Hut algorithm and our proposed new Barnes--Hut algorithm, given for varying numbers of neurons per rank: 1024 neurons per MPI rank on the top left, 4096 neurons per rank on the top right, 16 384 neurons per rank on the bottom left, and 65 536 neurons per rank on the bottom right.
  • Figure 4: Timing plots for the transfer of neuron spikes and neuron firing frequencies, given for varying numbers of neurons per MPI rank: 1024 neurons per rank on the top left, 4096 neurons per rank on the top right, 16 384 neurons per rank on the bottom left, and 65 536 neurons per rank on the bottom right. Note that transferring the frequencies is virtually free.
  • Figure 5: Timing plots for look-up of spikes from distant neurons (by a binary search) and for the approximation of spikes based on the frequency with a pseudo-random number generator (PRNG), given for varying numbers of neurons per MPI rank: 1024 neurons per rank on the top left, 4096 neurons per rank on the top right, 16 384 neurons per rank on the bottom left, and 65 536 neurons per rank on the bottom right.
  • ...and 2 more figures