Table of Contents
Fetching ...

BSO: Binary Spiking Online Optimization Algorithm

Yu Liang, Yu Yang, Wenjie Wei, Ammar Belatreche, Shuai Wang, Malu Zhang, Yang Yang

TL;DR

Binary Spiking Online (BSO) introduces an online training algorithm specifically for Binary Spiking Neural Networks (BSNNs) to reduce training memory by updating weights via flip signals, removing latent weights. A temporal-aware extension, T-BSO, uses first- and second-order gradient moments to adapt flipping thresholds across time steps, capturing BSNN dynamics with thresholds $\gamma_t = \gamma \sqrt{v^l[t] + \epsilon}$. The authors prove regret bounds for both methods under standard online learning assumptions, showing $R(\mathcal{T}) = O(\sqrt{\mathcal{T}})$ for BSO and $R(\mathcal{T}) = O(\mathcal{T}^{3/4})$ for T-BSO, indicating sublinear convergence. Extensive experiments on CIFAR-10/100, ImageNet, and DVS-CIFAR10 demonstrate competitive accuracy with significantly reduced training memory compared to BPTT-based BSNNs, including online performance. Code is released at https://github.com/hamings1/BSO.

Abstract

Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https://github.com/hamings1/BSO.

BSO: Binary Spiking Online Optimization Algorithm

TL;DR

Binary Spiking Online (BSO) introduces an online training algorithm specifically for Binary Spiking Neural Networks (BSNNs) to reduce training memory by updating weights via flip signals, removing latent weights. A temporal-aware extension, T-BSO, uses first- and second-order gradient moments to adapt flipping thresholds across time steps, capturing BSNN dynamics with thresholds . The authors prove regret bounds for both methods under standard online learning assumptions, showing for BSO and for T-BSO, indicating sublinear convergence. Extensive experiments on CIFAR-10/100, ImageNet, and DVS-CIFAR10 demonstrate competitive accuracy with significantly reduced training memory compared to BPTT-based BSNNs, including online performance. Code is released at https://github.com/hamings1/BSO.

Abstract

Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https://github.com/hamings1/BSO.

Paper Structure

This paper contains 28 sections, 5 theorems, 39 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

Assume that the function $f_k$ has bounded gradients, $\|\nabla f_k(w)\|_2 \leq G, \|\nabla f_k(w)\|_\infty\leq G_\infty$ for all $w \in \mathbb{R}^d$. Let $\gamma$ and $\beta$ decay by $\sqrt{k}$. BSO achieves the following guarantee for all $k \geq 1$. Let $\beta_1,\beta_2$ and $\gamma$ decay by $\sqrt{k}$, the T-BSO achieves the following guarantee for all $k \geq 1$.

Figures (5)

  • Figure 1: Forward and backward propagation of BSNNs under BPTT method.
  • Figure 2: Weight update strategies of BSO and T-BSO during backpropagation. (a) BSO employs the same flipping threshold $\gamma$ at each time step, updating synapse weights through the dot product of momentum ${M}^l$ and Binary $W^l$ in BSNNs. (b) T-BSO incorporates second-order momentum $v^l[t]$ into $\gamma$, thereby achieving temporal dynamic allocation with Temporal-aware flipping threshold $\gamma_{t}$.
  • Figure 3: The cosine similarity between the gradients across time step. Both horizontal and vertical axes represent time steps.
  • Figure 4: Memory cost comparison between the widely used BPTT algorithm in BSNN and our BSO/T-BSO across varying time steps.
  • Figure 5: Performance comparison between our BSO and T-BSO under different time steps. T-BSO consistently outperforms BSO across time steps and exhibits a larger increase over time steps.

Theorems & Definitions (9)

  • Theorem 4.1
  • Corollary 4.2
  • Definition 1.1
  • Lemma 1.2
  • Definition 1.3
  • Theorem 1.6
  • proof
  • Theorem 1.7
  • proof