Table of Contents
Fetching ...

Spiking Diffusion Models

Jiahang Cao, Hanzhong Guo, Ziqing Wang, Deming Zhou, Hao Cheng, Qiang Zhang, Renjing Xu

TL;DR

This work introduces Spiking Diffusion Models (SDMs), a family of SNN-based generators that achieve high-quality image synthesis with substantially reduced energy consumption. Key innovations include the Temporal-wise Spiking Mechanism, which enables time-adaptive membrane dynamics, and a training-free Threshold Guidance that improves sampling without extra training. The authors demonstrate strong results across CIFAR-10, CelebA, and LSUN-bedroom, with energy savings relative to ANN baselines and competitive performance versus direct-training ANN-Diffusion models; they also explore ANN-SNN conversion to extend applicability. Overall, SDMs advance energy-efficient generative modeling by leveraging neuromorphic principles while maintaining or surpassing prior SNN approaches in image quality, and they open avenues for low-latency, low-power generation on neuromorphic hardware.

Abstract

Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking Diffusion Models (SDMs), an innovative family of SNN-based generative models that excel in producing high-quality samples with significantly reduced energy consumption. In particular, we propose a Temporal-wise Spiking Mechanism (TSM) that allows SNNs to capture more temporal features from a bio-plasticity perspective. In addition, we propose a threshold-guided strategy that can further improve the performances by up to 16.7% without any additional training. We also make the first attempt to use the ANN-SNN approach for SNN-based generation tasks. Extensive experimental results reveal that our approach not only exhibits comparable performance to its ANN counterpart with few spiking time steps, but also outperforms previous SNN-based generative models by a large margin. Moreover, we also demonstrate the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN bedroom. This development marks a pivotal advancement in the capabilities of SNN-based generation, paving the way for future research avenues to realize low-energy and low-latency generative applications. Our code is available at https://github.com/AndyCao1125/SDM.

Spiking Diffusion Models

TL;DR

This work introduces Spiking Diffusion Models (SDMs), a family of SNN-based generators that achieve high-quality image synthesis with substantially reduced energy consumption. Key innovations include the Temporal-wise Spiking Mechanism, which enables time-adaptive membrane dynamics, and a training-free Threshold Guidance that improves sampling without extra training. The authors demonstrate strong results across CIFAR-10, CelebA, and LSUN-bedroom, with energy savings relative to ANN baselines and competitive performance versus direct-training ANN-Diffusion models; they also explore ANN-SNN conversion to extend applicability. Overall, SDMs advance energy-efficient generative modeling by leveraging neuromorphic principles while maintaining or surpassing prior SNN approaches in image quality, and they open avenues for low-latency, low-power generation on neuromorphic hardware.

Abstract

Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking Diffusion Models (SDMs), an innovative family of SNN-based generative models that excel in producing high-quality samples with significantly reduced energy consumption. In particular, we propose a Temporal-wise Spiking Mechanism (TSM) that allows SNNs to capture more temporal features from a bio-plasticity perspective. In addition, we propose a threshold-guided strategy that can further improve the performances by up to 16.7% without any additional training. We also make the first attempt to use the ANN-SNN approach for SNN-based generation tasks. Extensive experimental results reveal that our approach not only exhibits comparable performance to its ANN counterpart with few spiking time steps, but also outperforms previous SNN-based generative models by a large margin. Moreover, we also demonstrate the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN bedroom. This development marks a pivotal advancement in the capabilities of SNN-based generation, paving the way for future research avenues to realize low-energy and low-latency generative applications. Our code is available at https://github.com/AndyCao1125/SDM.
Paper Structure (28 sections, 29 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 28 sections, 29 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparisons of the state-of-the-art SNN models. The FID is at $log_2$ scale and the marker size corresponds to the IS metric. In comparison to other SNN generative models, our models demonstrate better FID while requiring lower time steps.
  • Figure 2: Overview of our Spiking Diffusion Models. The learning process of SDM consists of two stages: (1) the training stage and (2) the fine-tuning stage. During the training stage, our spiking UNet adopts the standard Pre-spike Resblock (bottom left, Sec. \ref{['subsec:pre_spike']}), and then converts the Pre-spike block into the TSM block (bottom right, Sec. \ref{['subsec:tsm']}) for the fine-tuning stage. Given a random Gaussian noise input $x_t$, it is firstly converted into the spike representation by a spiking encoder and subsequently fed into the spiking UNet along with the time embeddings. The network transmits only spikes which are represented by $0/1$ vector ($\in \mathbb{Z}_{\{0,1\}}$). Finally, the output spikes are passed through a decoder to obtain the predicted noise $\epsilon$, and the loss is computed to update the network. In the fine-tuning phase, we load the weights from the training phase and substitute the Pre-spike block with the TSM block, where the temporal parameter $p$ is initialized as 1.0. This stage continues to optimize the network's parameters for better generative performance.
  • Figure 3: Overview of temporal-wise spiking mechanism. After a spike neuron triggers spikes, the spikes would be converted in the pre-synapse to obtain the input current $I$. To derive more dynamic information, the temporal parameter $P$ will act on the current to form the time-adaptive current $\hat{I}$.
  • Figure 4: Unconditional image generation results on MNIST, Fashion-MNIST, CIFAR-10, CelebA, and LSUN-bed by using direct training-based SDMs.
  • Figure 5: Comparisons of the generation results with/without using the TSM method in CIFAR-10.
  • ...and 3 more figures