Sharpness Aware Surrogate Training for Spiking Neural Networks

Maximilian Nicholson

Sharpness Aware Surrogate Training for Spiking Neural Networks

Maximilian Nicholson

Abstract

Surrogate gradients are a standard tool for training spiking neural networks (SNNs), but conventional hard forward or surrogate backward training couples a nonsmooth forward model with a biased gradient estimator. We study sharpness aware Surrogate Training (SAST), which applies sharpness aware Minimization (SAM) to a surrogate forward SNN trained by backpropagation. In this formulation, the optimization target is an ordinary smooth empirical risk, so the training gradient is exact for the auxiliary model being optimized. Under explicit boundedness and contraction assumptions, we derive compact state stability and input Lipschitz bounds, establish smoothness of the surrogate objective, provide a first order SAM approximation bound, and prove a nonconvex convergence guarantee for stochastic SAST with an independent second minibatch. We also isolate a local mechanism proposition, stated separately from the unconditional guarantees, that links per sample parameter gradient control to smaller input gradient norms under local Jacobian conditioning. Empirically, we evaluate clean accuracy, hard spike transfer, corruption robustness, and training overhead on N-MNIST and DVS Gesture. The clearest practical effect is transfer gap reduction: on N-MNIST, hard spike accuracy rises from 65.7% to 94.7% (best at $ρ=0.30$) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at $ρ=0.40$). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.

Sharpness Aware Surrogate Training for Spiking Neural Networks

Abstract

) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at

). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.

Paper Structure (36 sections, 8 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 8 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm.

Introduction
Related work and positioning
Model and Training Objective
Notation
hard spike and surrogate forward SNNs
Loss and empirical objective
Method: sharpness aware Surrogate Training
Implementation details used throughout.
Theory in the Main Paper
Setup assumptions
State stability, smoothness, and robustness
Experiments
Benchmarks, protocol, and theory alignment
Baselines and fairness.
Main results
...and 21 more sections

Key Result

Proposition 4.7

Under Assumptions ass:inputs--ass:gamma and Definition def:surrogate, the surrogate states are uniformly bounded. In particular, for every layer $\ell$ and time $t$, with one valid explicit choice where $R_z^{(0)}\mathrel{\mathop:}= R_x$ and $R_z^{(\ell-1)}\mathrel{\mathop:}= \sqrt{d_{\ell-1}}$ for $\ell\ge 2$. Consequently, the surrogate readout is input-Lipschitz: with one valid explicit choi

Figures (2)

Figure 1: Overview of the surrogate forward SNN used during training. At inference, the final trained model can be evaluated either with the surrogate nonlinearity or by replacing it with a hard threshold according to Definition \ref{['def:hard_eval']}.
Figure 2: Robustness on N-MNIST under random event-drop corruption. Test accuracy (mean $\pm$ std over seeds) is plotted against drop probability $p$ for both surrogate forward and hard spike evaluation; SAST trained models are expected to degrade more gracefully than the baseline as corruption increases, with moderate-to-large radii typically showing the strongest gains at high drop rates.

Theorems & Definitions (25)

Definition 2.1: hard spike LIF SNN with reset-by-subtraction
Remark 2.2: theory aligned linear blocks
Definition 2.3: Admissible surrogate nonlinearity
Definition 2.4: surrogate forward SNN
Remark 2.5: Scope of the theory
Definition 2.6: hard spike evaluation protocol
Remark 2.7: Why surrogate-to-hard transfer can degrade
Remark 4.5
Remark 4.6: Run-level contraction diagnostic
Proposition 4.7: State stability and input Lipschitz continuity
...and 15 more

Sharpness Aware Surrogate Training for Spiking Neural Networks

Abstract

Sharpness Aware Surrogate Training for Spiking Neural Networks

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (25)