Table of Contents
Fetching ...

Spatio-Temporal Cluster-Triggered Encoding for Spiking Neural Networks

Lingyun Ke, Minchi Hu

TL;DR

This work tackles the challenge of efficiently encoding static images and event streams for Spiking Neural Networks by preserving spatial semantics through a cluster-based, density-driven approach. The authors introduce two complementary encoders: 2D-CTE for static images and ST3D (3D-CTE) for temporally-rich event streams, both grounded in local density computations and cluster gating to produce sparse, informative spikes. On N-MNIST, ST3D attains 98.17% accuracy with roughly 3800 spikes per sample, surpassing the TTFS baseline and matching deeper architectures with fewer spikes, while 2D-CTE achieves strong MNIST performance with interpretable, low-complexity operations. The results demonstrate interpretable, energy-efficient encoding suitable for neuromorphic hardware, highlighting practical impact for real-time, low-power visual processing in SNNs.

Abstract

Encoding static images into spike trains is a crucial step for enabling Spiking Neural Networks (SNNs) to process visual information efficiently. However, existing schemes such as rate coding, Poisson encoding, and time-to-first-spike (TTFS) often ignore spatial relationships and yield temporally inconsistent spike patterns. In this article, a novel cluster-based encoding approach is proposed, which leverages local density computation to preserve semantic structure in both spatial and temporal domains. This method introduces a 2D spatial cluster trigger that identifies foreground regions through connected component analysis and local density estimation. Then, extend to a 3D spatio-temporal (ST3D) framework that jointly considers temporal neighborhoods, producing spike trains with improved temporal consistency. Experiments on the N-MNIST dataset demonstrate that our ST3D encoder achieves 98.17% classification accuracy with a simple single-layer SNN, outperforming standard TTFS encoding (97.58%) and matching the performance of more complex deep architectures while using significantly fewer spikes (~3800 vs ~5000 per sample). The results demonstrate that this approach provides an interpretable and efficient encoding strategy for neuromorphic computing applications.

Spatio-Temporal Cluster-Triggered Encoding for Spiking Neural Networks

TL;DR

This work tackles the challenge of efficiently encoding static images and event streams for Spiking Neural Networks by preserving spatial semantics through a cluster-based, density-driven approach. The authors introduce two complementary encoders: 2D-CTE for static images and ST3D (3D-CTE) for temporally-rich event streams, both grounded in local density computations and cluster gating to produce sparse, informative spikes. On N-MNIST, ST3D attains 98.17% accuracy with roughly 3800 spikes per sample, surpassing the TTFS baseline and matching deeper architectures with fewer spikes, while 2D-CTE achieves strong MNIST performance with interpretable, low-complexity operations. The results demonstrate interpretable, energy-efficient encoding suitable for neuromorphic hardware, highlighting practical impact for real-time, low-power visual processing in SNNs.

Abstract

Encoding static images into spike trains is a crucial step for enabling Spiking Neural Networks (SNNs) to process visual information efficiently. However, existing schemes such as rate coding, Poisson encoding, and time-to-first-spike (TTFS) often ignore spatial relationships and yield temporally inconsistent spike patterns. In this article, a novel cluster-based encoding approach is proposed, which leverages local density computation to preserve semantic structure in both spatial and temporal domains. This method introduces a 2D spatial cluster trigger that identifies foreground regions through connected component analysis and local density estimation. Then, extend to a 3D spatio-temporal (ST3D) framework that jointly considers temporal neighborhoods, producing spike trains with improved temporal consistency. Experiments on the N-MNIST dataset demonstrate that our ST3D encoder achieves 98.17% classification accuracy with a simple single-layer SNN, outperforming standard TTFS encoding (97.58%) and matching the performance of more complex deep architectures while using significantly fewer spikes (~3800 vs ~5000 per sample). The results demonstrate that this approach provides an interpretable and efficient encoding strategy for neuromorphic computing applications.

Paper Structure

This paper contains 32 sections, 14 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: 2D Spatial Cluster Encodering Pipeline.
  • Figure 2: 3D Spatial Cluster Encodering Pipeline for DVS events.
  • Figure 3: Convergence of SNN on MNIST with Cluster-Triggered Encoding accuracy vs. epoch (best 97.87% @ epoch 46).