Table of Contents
Fetching ...

Sharpness Aware Surrogate Training for Spiking Neural Networks

Maximilian Nicholson

Abstract

Surrogate gradients are a standard tool for training spiking neural networks (SNNs), but conventional hard forward or surrogate backward training couples a nonsmooth forward model with a biased gradient estimator. We study sharpness aware Surrogate Training (SAST), which applies sharpness aware Minimization (SAM) to a surrogate forward SNN trained by backpropagation. In this formulation, the optimization target is an ordinary smooth empirical risk, so the training gradient is exact for the auxiliary model being optimized. Under explicit boundedness and contraction assumptions, we derive compact state stability and input Lipschitz bounds, establish smoothness of the surrogate objective, provide a first order SAM approximation bound, and prove a nonconvex convergence guarantee for stochastic SAST with an independent second minibatch. We also isolate a local mechanism proposition, stated separately from the unconditional guarantees, that links per sample parameter gradient control to smaller input gradient norms under local Jacobian conditioning. Empirically, we evaluate clean accuracy, hard spike transfer, corruption robustness, and training overhead on N-MNIST and DVS Gesture. The clearest practical effect is transfer gap reduction: on N-MNIST, hard spike accuracy rises from 65.7% to 94.7% (best at $ρ=0.30$) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at $ρ=0.40$). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.

Sharpness Aware Surrogate Training for Spiking Neural Networks

Abstract

Surrogate gradients are a standard tool for training spiking neural networks (SNNs), but conventional hard forward or surrogate backward training couples a nonsmooth forward model with a biased gradient estimator. We study sharpness aware Surrogate Training (SAST), which applies sharpness aware Minimization (SAM) to a surrogate forward SNN trained by backpropagation. In this formulation, the optimization target is an ordinary smooth empirical risk, so the training gradient is exact for the auxiliary model being optimized. Under explicit boundedness and contraction assumptions, we derive compact state stability and input Lipschitz bounds, establish smoothness of the surrogate objective, provide a first order SAM approximation bound, and prove a nonconvex convergence guarantee for stochastic SAST with an independent second minibatch. We also isolate a local mechanism proposition, stated separately from the unconditional guarantees, that links per sample parameter gradient control to smaller input gradient norms under local Jacobian conditioning. Empirically, we evaluate clean accuracy, hard spike transfer, corruption robustness, and training overhead on N-MNIST and DVS Gesture. The clearest practical effect is transfer gap reduction: on N-MNIST, hard spike accuracy rises from 65.7% to 94.7% (best at ) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at ). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.
Paper Structure (36 sections, 8 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 8 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm.

Key Result

Proposition 4.7

Under Assumptions ass:inputs--ass:gamma and Definition def:surrogate, the surrogate states are uniformly bounded. In particular, for every layer $\ell$ and time $t$, with one valid explicit choice where $R_z^{(0)}\mathrel{\mathop:}= R_x$ and $R_z^{(\ell-1)}\mathrel{\mathop:}= \sqrt{d_{\ell-1}}$ for $\ell\ge 2$. Consequently, the surrogate readout is input-Lipschitz: with one valid explicit choi

Figures (2)

  • Figure 1: Overview of the surrogate forward SNN used during training. At inference, the final trained model can be evaluated either with the surrogate nonlinearity or by replacing it with a hard threshold according to Definition \ref{['def:hard_eval']}.
  • Figure 2: Robustness on N-MNIST under random event-drop corruption. Test accuracy (mean $\pm$ std over seeds) is plotted against drop probability $p$ for both surrogate forward and hard spike evaluation; SAST trained models are expected to degrade more gracefully than the baseline as corruption increases, with moderate-to-large radii typically showing the strongest gains at high drop rates.

Theorems & Definitions (25)

  • Definition 2.1: hard spike LIF SNN with reset-by-subtraction
  • Remark 2.2: theory aligned linear blocks
  • Definition 2.3: Admissible surrogate nonlinearity
  • Definition 2.4: surrogate forward SNN
  • Remark 2.5: Scope of the theory
  • Definition 2.6: hard spike evaluation protocol
  • Remark 2.7: Why surrogate-to-hard transfer can degrade
  • Remark 4.5
  • Remark 4.6: Run-level contraction diagnostic
  • Proposition 4.7: State stability and input Lipschitz continuity
  • ...and 15 more