Event-Stream Super Resolution using Sigma-Delta Neural Network
Waseem Shariff, Joe Lemley, Peter Corcoran
TL;DR
This work tackles the limited spatial resolution of neuromorphic event cameras by introducing an end-to-end Sigma-Delta Neural Network (SDNN) that fuses binary spikes with sigma-delta modulation to perform event-stream super-resolution. The method leverages temporal difference and integration (ΔT, ΣT) to learn spatio-temporal distributions while operating with sparse, continuous-time representations, aided by a convolutional encoder–decoder architecture and PSTH-based loss combining $${Loss}^{Temporal}$$ and $${Loss}^{Spatial}$$. Across N-MNIST, CIFAR10-DVS, ASL-DVS, and E-NFS, SDNN achieves superior RMSE and PSNR, dramatically improving computational efficiency with up to 17.04× higher event sparsity and 32.28× fewer synaptic operations than an equivalent ANN, and roughly 2× better performance than SNNs. The approach shows strong potential for real-time, energy-efficient event-based SR and downstream tasks like object recognition, with demonstrated gains on multiple benchmarks and clear directions for hardware deployment and further optimization.
Abstract
This study introduces a novel approach to enhance the spatial-temporal resolution of time-event pixels based on luminance changes captured by event cameras. These cameras present unique challenges due to their low resolution and the sparse, asynchronous nature of the data they collect. Current event super-resolution algorithms are not fully optimized for the distinct data structure produced by event cameras, resulting in inefficiencies in capturing the full dynamism and detail of visual scenes with improved computational complexity. To bridge this gap, our research proposes a method that integrates binary spikes with Sigma Delta Neural Networks (SDNNs), leveraging spatiotemporal constraint learning mechanism designed to simultaneously learn the spatial and temporal distributions of the event stream. The proposed network is evaluated using widely recognized benchmark datasets, including N-MNIST, CIFAR10-DVS, ASL-DVS, and Event-NFS. A comprehensive evaluation framework is employed, assessing both the accuracy, through root mean square error (RMSE), and the computational efficiency of our model. The findings demonstrate significant improvements over existing state-of-the-art methods, specifically, the proposed method outperforms state-of-the-art performance in computational efficiency, achieving a 17.04-fold improvement in event sparsity and a 32.28-fold increase in synaptic operation efficiency over traditional artificial neural networks, alongside a two-fold better performance over spiking neural networks.
