Table of Contents
Fetching ...

The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

Aaron Spieler, Nasim Rahaman, Georg Martius, Bernhard Schölkopf, Anna Levina

TL;DR

The paper introduces the Expressive Leaky Memory (ELM) neuron, a biologically inspired phenomenological model designed to capture a cortical neuron's input–output behavior using a memory-based, nonlinear integration framework. By employing multiple memory units with learnable timescales and a nonlinear dendritic-like integration via a compact MLP, the ELM achieves accurate spike and voltage predictions for a detailed biophysical neuron with far fewer parameters than previous surrogates, and demonstrates strong long-horizon processing on tasks such as SHD-Adding and Pathfinder-X. The work also presents a Branch-ELM variant that further reduces parameter count without sacrificing performance, and provides extensive ablations and comparisons to LIF/ALIF, LSTM, TCN, and Transformers, suggesting that biologically informed inductive biases can yield powerful, efficient models for temporal computation. Overall, the ELM framework advances our understanding of cortical computation principles and offers a practical, scalable approach for long-range sequence processing with potential implications for neuroscience-inspired AI and neuromorphic hardware.

Abstract

Biological cortical neurons are remarkably sophisticated computational devices, temporally integrating their vast synaptic input over an intricate dendritic tree, subject to complex, nonlinearly interacting internal biological processes. A recent study proposed to characterize this complexity by fitting accurate surrogate models to replicate the input-output relationship of a detailed biophysical cortical pyramidal neuron model and discovered it needed temporal convolutional networks (TCN) with millions of parameters. Requiring these many parameters, however, could stem from a misalignment between the inductive biases of the TCN and cortical neuron's computations. In light of this, and to explore the computational implications of leaky memory units and nonlinear dendritic processing, we introduce the Expressive Leaky Memory (ELM) neuron model, a biologically inspired phenomenological model of a cortical neuron. Remarkably, by exploiting such slowly decaying memory-like hidden states and two-layered nonlinear integration of synaptic input, our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters. To further assess the computational ramifications of our neuron design, we evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets, as well as a novel neuromorphic dataset based on the Spiking Heidelberg Digits dataset (SHD-Adding). Leveraging a larger number of memory units with sufficiently long timescales, and correspondingly sophisticated synaptic integration, the ELM neuron displays substantial long-range processing capabilities, reliably outperforming the classic Transformer or Chrono-LSTM architectures on LRA, and even solving the Pathfinder-X task with over 70% accuracy (16k context length).

The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

TL;DR

The paper introduces the Expressive Leaky Memory (ELM) neuron, a biologically inspired phenomenological model designed to capture a cortical neuron's input–output behavior using a memory-based, nonlinear integration framework. By employing multiple memory units with learnable timescales and a nonlinear dendritic-like integration via a compact MLP, the ELM achieves accurate spike and voltage predictions for a detailed biophysical neuron with far fewer parameters than previous surrogates, and demonstrates strong long-horizon processing on tasks such as SHD-Adding and Pathfinder-X. The work also presents a Branch-ELM variant that further reduces parameter count without sacrificing performance, and provides extensive ablations and comparisons to LIF/ALIF, LSTM, TCN, and Transformers, suggesting that biologically informed inductive biases can yield powerful, efficient models for temporal computation. Overall, the ELM framework advances our understanding of cortical computation principles and offers a practical, scalable approach for long-range sequence processing with potential implications for neuroscience-inspired AI and neuromorphic hardware.

Abstract

Biological cortical neurons are remarkably sophisticated computational devices, temporally integrating their vast synaptic input over an intricate dendritic tree, subject to complex, nonlinearly interacting internal biological processes. A recent study proposed to characterize this complexity by fitting accurate surrogate models to replicate the input-output relationship of a detailed biophysical cortical pyramidal neuron model and discovered it needed temporal convolutional networks (TCN) with millions of parameters. Requiring these many parameters, however, could stem from a misalignment between the inductive biases of the TCN and cortical neuron's computations. In light of this, and to explore the computational implications of leaky memory units and nonlinear dendritic processing, we introduce the Expressive Leaky Memory (ELM) neuron model, a biologically inspired phenomenological model of a cortical neuron. Remarkably, by exploiting such slowly decaying memory-like hidden states and two-layered nonlinear integration of synaptic input, our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters. To further assess the computational ramifications of our neuron design, we evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets, as well as a novel neuromorphic dataset based on the Spiking Heidelberg Digits dataset (SHD-Adding). Leveraging a larger number of memory units with sufficiently long timescales, and correspondingly sophisticated synaptic integration, the ELM neuron displays substantial long-range processing capabilities, reliably outperforming the classic Transformer or Chrono-LSTM architectures on LRA, and even solving the Pathfinder-X task with over 70% accuracy (16k context length).
Paper Structure (26 sections, 14 figures, 13 tables)

This paper contains 26 sections, 14 figures, 13 tables.

Figures (14)

  • Figure 1: The biologically motivated Expressive Leaky Memory (ELM) neuron model. The architecture can be divided into the following components: the input current synapse dynamics, the integration mechanism dynamics, the leaky memory dynamics, and the output dynamics. a) Sketch of a biological cortical pyramidal neuron segmented into the analogous architectural components using the corresponding colors. b) Schematics of the ELM neuron architecture, component-wise colored accordingly. c) The ELM neuron equations, where $\boldsymbol{x_t} \in \mathbb{R}^{d_s}$ is the input at time $t$, $\Delta t \in \mathbb{R}^{+}$ the fictitious elapsed time in milliseconds between two consecutive inputs $\boldsymbol{x_{t-1}}$ and $\boldsymbol{x_{t}}$ , $\textcolor{blueVar}{\boldsymbol{m}} \in \mathbb{R}^{d_m}$ are memory units, $\textcolor{greenVar}{\boldsymbol{s}} \in \mathbb{R}^{d_s}$ the synapse currents (traces), $\textcolor{blueVar}{\boldsymbol{\tau_m}} \in \mathbb{R^+}^{d_m}$ and $\textcolor{greenVar}{\boldsymbol{\tau_s}} \in \mathbb{R^+}^{d_s}$ their respective timescales in milliseconds, $\textcolor{greenVar}{\boldsymbol{w_s}} \in \mathbb{R^+}^{d_s}$ are synapse weights, $\textcolor{orangeVar}{\boldsymbol{w_p}}$ the weights of a Multilayer Perceptron ($\text{MLP}$) with $l_\mathrm{mlp}$ hidden layers of size $d_{\mathrm{mlp}}$, $\textcolor{purpleVar}{\boldsymbol{w_y}} \in \mathbb{R}^{d_o \times d_m}$ the output weights, $\textcolor{orangeVar}{\lambda} \in \mathbb{R^+}$ a scaling factor for the delta memory $\textcolor{orangeVar}{\Delta m_t} \in \mathbb{R}^{d_m}$, and $\textcolor{purpleVar}{\boldsymbol{y}} \in \mathbb{R}^{d_o}$ the output.
  • Figure 2: The ELM neuron is a computationally efficient model of cortical neuron.a) detailed biophysical model of a layer 5 cortical pyramidal cell was used to generate the NeuronIO dataset consisting of input spikes and output spikes and voltage. b) and c) Voltage and spike prediction performance of the respective surrogate models, produced using joint ablation of $d_m$ with $d_{\mathrm{mlp}}=2d_m$ for ELM models. Previously around 10M parameters were required to make accurate spike predictions using a TCNbeniaguev2021single, an LSTM baseline is able to do it with 266K, and our ELM and Branch-ELM neuron model require 53K and 8K respectively (3rd from left each), simultaneously achieving much better voltage prediction performance than the TCN. For comparison in terms of TP/FP Rate performance or FLOPS cost see Fig. \ref{['fig:neuronio_model_and_results_branch']}c or \ref{['fig:neuronio_results_flops']} respectively. Additional comparisons to other phenomenological neuron models, such as LIF and ALIF, are provided in Table \ref{['tbl:simple_models_on_neuronio']}.
  • Figure 3: The ELM neuron gives relevant neuroscientific insights. Ablations on NeuronIO of different hyperparameters of an ELM neuron with AUC $\approx 0.992$, and a Branch-ELM with the same default hyperparameters. The number of removed divergent runs marked with $1^*$. a) We find between 10 and 20 memory-like hidden states to be required for accurate predictions, much more than typical phenomenological models use izhikevich2004modeldayan2005theoretical. b) Highly nonlinear integration of synaptic input is required, in line with recent neuroscientific findings stuart2015dendriticjones2022biologicallarkum2022dendrites. c) Allowing greater updates to the memory units is beneficial (see Appendix \ref{['Suppl:implementation_details']}). d-f) Ablations of memory timescale (initialization and bounding) range or (constant) value, with the default range being 1ms-150ms. Timescales around 25 ms seems to be the most useful (matching the typical membrane timescale in the cortex dayan2005theoretical); however, a lack can be partially compensated by longer timescales. g) and h) Ablating the number of branches $d_{\mathrm{tree}}$ and number of synapses per branch $d_{\mathrm{brch}}$ of the Branch-ELM neuron.
  • Figure 4: Coarse-grained modeling of synaptic integration significantly improves model efficiency.a) The integration mechanism dynamics of the ELM now computes the activity of individual dendritic branches as a simple sum of their respective synaptic inputs first before passing them on to the $\text{MLP}_{\textcolor{orangeVar}{\boldsymbol{w_p}}}$, where $d_{\mathrm{tree}}$ is the number of branches and $d_{\mathrm{brch}}$ the number of synapses per branch. b) Accurate predictions using a Branch-ELM neuron with 8104 parameters (for zoomed-in version with model dynamics see Figure \ref{['fig:detailed_euron_inference']}). c). The new Branch-ELM neuron improves on the ELM neuron by about 7$\times$ in terms of parameter efficacy (same ELM hyper-parameters). Differences in model quality are highlighted when examining a True-Positive rate at a low False-Positive rate.
  • Figure 5: The ELM neuron performs well on long and sparse data using longer timescales.a) Sample from the biologically motivated SHD-Adding dataset (based on cramer2020heidelberg), each dot is an input spike, and a vertical dashed line is a guide for the eye indicating the separation of the two digits (not communicated to the network). b-d) The ELM neuron (186K params.) consistently outperforms a classic LSTM (956K params.), especially for smaller bin sizes (meaning longer training samples), and LSTM-performance cannot be fully recovered even for larger bin sizes. The Branch-ELM (67K params.) can retain performance for fine-grained binning at a much reduced model size. Our LIF neuron based Spiking Neural Network (SNN) (51K params.) does not manage to achieve good performance for any bin size, and training becomes unstable for long sequences. e) and f) Ablations using a bin size of 2ms with test set performance reported. e) Solving SHD-Adding requires ELM neuron to have a higher complexity than required for NeuronIO, and much larger models become unstable. Potentially a network of smaller ELM neuron might be preferable. f) Longer $\textcolor{blueVar}{\tau_m}$ are crucial for extracting long-range dependencies. Possibly shorter ones might suffice in a ELM network, as longer timescales can emerge through dynamics khajehabdollahi2023emergent.
  • ...and 9 more figures