A Hybrid Spiking-Convolutional Neural Network Approach for Advancing Machine Learning Models
Sanaullah, Kaushik Roy, Ulrich Rückert, Thorsten Jungeblut
TL;DR
The paper addresses image inpainting by integrating temporal dynamics through spiking neurons into a convolutional framework. It introduces a standalone hybrid SC-NN with one SNNConv2d (LIF) layer and five CNNConv2d layers, trained on a masked version of the LSDIR dataset using mean squared error losses and the Adam optimizer. The LIF dynamics are described by $\frac{dV(t)}{dt} = \frac{I(t) - V(t)/R}{C}$ with spike generation at $V_{\rm th}$ and reset, and leakage modeled by $\frac{dV}{dt} = -\frac{V}{\tau}$, enabling temporal context in addition to spatial features. Results show a training MSE of $0.015$ and a validation MSE of $0.0017$, with qualitative inpainting improvements over state-of-the-art baselines, illustrating the value of fusing temporal-spiking processing with conventional CNN features for vision tasks.
Abstract
In this article, we propose a novel standalone hybrid Spiking-Convolutional Neural Network (SC-NN) model and test on using image inpainting tasks. Our approach uses the unique capabilities of SNNs, such as event-based computation and temporal processing, along with the strong representation learning abilities of CNNs, to generate high-quality inpainted images. The model is trained on a custom dataset specifically designed for image inpainting, where missing regions are created using masks. The hybrid model consists of SNNConv2d layers and traditional CNN layers. The SNNConv2d layers implement the leaky integrate-and-fire (LIF) neuron model, capturing spiking behavior, while the CNN layers capture spatial features. In this study, a mean squared error (MSE) loss function demonstrates the training process, where a training loss value of 0.015, indicates accurate performance on the training set and the model achieved a validation loss value as low as 0.0017 on the testing set. Furthermore, extensive experimental results demonstrate state-of-the-art performance, showcasing the potential of integrating temporal dynamics and feature extraction in a single network for image inpainting.
