NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

Zhengzheng Tang

NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

Zhengzheng Tang

Abstract

We ask whether a pure spiking backbone can learn large-scale language modeling from random initialization, without Transformer distillation. We introduce NeuronSpark, a 0.9B-parameter SNN language model trained with next-token prediction and surrogate gradients. The model combines selective state-space spiking dynamics, leakage-current inter-layer communication, PonderNet adaptive timesteps, fused Triton PLIF kernels, and stabilization techniques (residual centering, lateral-inhibition normalization, and natural-gradient compensation). Under a constrained budget (about 1.4B pretraining tokens and 6.5K SFT steps), NeuronSpark-0.9B reaches 3.6 pretraining loss and shows early multi-turn dialogue behavior after SFT. These results support the feasibility of end-to-end language modeling with a pure SNN architecture at this scale.

NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

Abstract

Paper Structure (52 sections, 8 equations, 6 figures, 7 tables)

This paper contains 52 sections, 8 equations, 6 figures, 7 tables.

Introduction
Contributions.
Related Work
Spiking Neural Networks for Language.
State Space Models.
Adaptive Computation.
Surrogate Gradient Training.
Architecture
Overview
PLIF Neuron Dynamics
PLIFNode (fixed parameters).
SelectivePLIFNode (dynamic parameters).
Membrane Potential Leakage Activation
Selective State Space SNN Block
Input projections
...and 37 more sections

Figures (6)

Figure 1: NeuronSpark architecture overview. The residual stream carries continuous values $\mathbf{h}$; PLIFLeak denotes PLIF neurons with leakage activation $(1-\beta)\cdot V_{\text{post}}$. PonderNet aggregation (applied per sublayer) collapses $K$ frames per token with learned geometric-distribution weights. Inter-layer communication uses floating-point leakage-current signals; binary spikes are internal firing events rather than the default layer-to-layer representation. The decode stage uses uniform $K$-frame mean. Residual centering (subtract per-token mean) is applied before each residual addition.
Figure 2: Pretraining loss curve over 85K steps ($\sim$1.4B tokens). Loss decreases from 9.0 to $\sim$3.5. Training throughput: $\sim$960 tokens/sec on 8$\times$ RTX 4090 GPUs.
Figure 3: Training loss: final architecture (blue) vs. 9 ablation variants. None of the ablation variants achieves a loss below 7.0.
Figure 4: Biological interpretability of NeuronSpark-0.9B-Chat. (a) Per-token E[K]: punctuation receives fewer steps than content words. (b) POS-level E[K]: function words/punctuation are lower by about $\sim$0.7. (c) Per-layer E[K]: SNNBlock increases with depth, SNNFFN stays near 7--8. (d) Learned $\beta$ distribution: 67.3% fast ($<0.9$), 32.7% slow ($\geq 0.9$).
Figure 5: Surprisal vs. E[K] (40 Chinese sentences, 541 tokens). (a) The apparent correlation is dominated by BOS tokens: $r=-0.50$ overall, $r=-0.12$ without BOS. (b) Binned E[K] is nearly flat across surprisal, indicating allocation is largely independent of predictive difficulty.
...and 1 more figures

NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

Abstract

NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

Authors

Abstract

Table of Contents

Figures (6)