Table of Contents
Fetching ...

UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra

Jose Marie Antonio Miñoza

TL;DR

UltraLIF introduces a principled differentiable SNN framework by applying ultradiscretization from tropical geometry to LIF-type dynamics, replacing surrogate gradients with soft max-plus relaxations using the log-sum-exp function with learnable temperature $\varepsilon$. It yields two instantiations, UltraLIF (temporal) and UltraDLIF (spatial), with forward-backward consistency and theoretical guarantees on convergence to classical LIF/diffusion dynamics and bounded gradients. Empirically, it improves over surrogate-gradient baselines across six benchmarks, especially at single-timestep ($T=1$) on neuromorphic and temporal data, and offers an explicit sparsity mechanism to reduce energy consumption. The work links spiking computation with tropical geometry, enabling new analytical tools and potential neuromorphic deployment strategies.

Abstract

Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate gradients. This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization, a mathematical formalism from tropical geometry providing continuous relaxations of discrete dynamics. The central insight is that the max-plus semiring underlying ultradiscretization naturally models neural threshold dynamics: the log-sum-exp function serves as a differentiable soft-maximum that converges to hard thresholding as a learnable temperature parameter $\eps \to 0$. Two neuron models are derived from distinct dynamical systems: UltraLIF from the LIF ordinary differential equation (temporal dynamics) and UltraDLIF from the diffusion equation modeling gap junction coupling across neuronal populations (spatial dynamics). Both yield fully differentiable SNNs trainable via standard backpropagation with no forward-backward mismatch. Theoretical analysis establishes pointwise convergence to classical LIF dynamics with quantitative error bounds and bounded non-vanishing gradients. Experiments on six benchmarks spanning static images, neuromorphic vision, and audio demonstrate improvements over surrogate gradient baselines, with gains most pronounced in single-timestep ($T{=}1$) settings on neuromorphic and temporal datasets. An optional sparsity penalty enables significant energy reduction while maintaining competitive accuracy.

UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra

TL;DR

UltraLIF introduces a principled differentiable SNN framework by applying ultradiscretization from tropical geometry to LIF-type dynamics, replacing surrogate gradients with soft max-plus relaxations using the log-sum-exp function with learnable temperature . It yields two instantiations, UltraLIF (temporal) and UltraDLIF (spatial), with forward-backward consistency and theoretical guarantees on convergence to classical LIF/diffusion dynamics and bounded gradients. Empirically, it improves over surrogate-gradient baselines across six benchmarks, especially at single-timestep () on neuromorphic and temporal data, and offers an explicit sparsity mechanism to reduce energy consumption. The work links spiking computation with tropical geometry, enabling new analytical tools and potential neuromorphic deployment strategies.

Abstract

Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate gradients. This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization, a mathematical formalism from tropical geometry providing continuous relaxations of discrete dynamics. The central insight is that the max-plus semiring underlying ultradiscretization naturally models neural threshold dynamics: the log-sum-exp function serves as a differentiable soft-maximum that converges to hard thresholding as a learnable temperature parameter . Two neuron models are derived from distinct dynamical systems: UltraLIF from the LIF ordinary differential equation (temporal dynamics) and UltraDLIF from the diffusion equation modeling gap junction coupling across neuronal populations (spatial dynamics). Both yield fully differentiable SNNs trainable via standard backpropagation with no forward-backward mismatch. Theoretical analysis establishes pointwise convergence to classical LIF dynamics with quantitative error bounds and bounded non-vanishing gradients. Experiments on six benchmarks spanning static images, neuromorphic vision, and audio demonstrate improvements over surrogate gradient baselines, with gains most pronounced in single-timestep () settings on neuromorphic and temporal datasets. An optional sparsity penalty enables significant energy reduction while maintaining competitive accuracy.
Paper Structure (45 sections, 12 theorems, 33 equations, 3 figures, 21 tables)

This paper contains 45 sections, 12 theorems, 33 equations, 3 figures, 21 tables.

Key Result

Lemma 3.2

Let $\mathbf{x} = (x_1, \ldots, x_n) \in \mathbb{R}^n$ and $M := \max_i x_i$. Then $M \leq \mathrm{LSE}_\varepsilon(\mathbf{x}) \leq M + \varepsilon \log n$, hence $\lim_{\varepsilon \to 0^+} \mathrm{LSE}_\varepsilon(\mathbf{x}) = M$. Moreover, $\nabla \mathrm{LSE}_\varepsilon(\mathbf{x}) = \mathrm{

Figures (3)

  • Figure 1: (a) Spike activation functions: Heaviside (hard threshold), surrogate gradient (smooth approximation), and ultradiscretized (principled soft relaxation). (b) Gradients: Heaviside has zero gradient almost everywhere (delta function at threshold); surrogate and ultradiscretized provide smooth gradients, but only ultradiscretized maintains forward-backward consistency.
  • Figure 2: Comparison of spike mechanisms. (a) Traditional LIF uses Heaviside $H(V-\theta)$ in the forward pass but a smooth surrogate $\sigma'$ for gradients, creating forward-backward mismatch (shaded region). (b) Ultradiscretized LIF (temporal, 2-term LSE from LIF ODE) and (c) Ultradiscretized DLIF (spatial, 3-term LSE from diffusion PDE) use identical smooth functions in both passes, ensuring gradient consistency. The membrane potential equations below each panel show the distinct derivations.
  • Figure 3: Epsilon ablation on MNIST ($T{=}1$, 100 epochs). (a) Learned $\varepsilon$ exhibits a characteristic U-shaped trajectory: initial drop from 1.0 to $\sim$0.42 (sharpening phase), followed by recovery to model-specific optima (0.66--1.08). This suggests the network first learns sharp discrimination, then softens for generalization. (b) Learned $\varepsilon$ (dashed lines) consistently matches or exceeds all fixed values across models, validating the benefit of learnable temperature.

Theorems & Definitions (28)

  • Definition 3.1: Max-Plus Semiring
  • Lemma 3.2: LSE Convergence
  • proof
  • Remark 3.3
  • Definition 4.1: UltraLIF Neuron (Temporal)
  • Definition 4.2: UltraDLIF Neuron (Spatial)
  • Lemma 5.1: Sigmoid Convergence
  • proof
  • Proposition 5.2: Convergence to LIF
  • proof
  • ...and 18 more