SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
Zhuoheng Gao, Yihao Li, Jiyao Zhang, Rui Zhao, Tong Wu, Hao Tang, Zhaofei Yu, Hao Dong, Guozhang Chen, Tiejun Huang
TL;DR
SpikeStereoNet addresses the challenge of stereo depth estimation from asynchronous spike streams by introducing a brain-inspired RSNN-based iterative refinement framework. It fuses multi-scale spike features, leverages a correlation pyramid, and uses adaptive ALIF neurons to iteratively improve disparity estimates, supervised by a composite loss balancing accuracy, firing rate, and membrane dynamics. The authors provide large synthetic and real spike datasets, demonstrate state-of-the-art performance and robust data efficiency, and show effective domain adaptation from synthetic to real spike data. This work advances neuromorphic stereo vision by enabling direct, high-temporal-resolution depth sensing from spike streams and offers benchmarks to accelerate future research.
Abstract
Conventional frame-based cameras often struggle with stereo depth estimation in rapidly changing scenes. In contrast, bio-inspired spike cameras emit asynchronous events at microsecond-level resolution, providing an alternative sensing modality. However, existing methods lack specialized stereo algorithms and benchmarks tailored to the spike data. To address this gap, we propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams. The model fuses raw spike streams from two viewpoints and iteratively refines depth estimation through a recurrent spiking neural network (RSNN) update module. To benchmark our approach, we introduce a large-scale synthetic spike stream dataset and a real-world stereo spike dataset with dense depth annotations. SpikeStereoNet outperforms existing methods on both datasets by leveraging spike streams' ability to capture subtle edges and intensity shifts in challenging regions such as textureless surfaces and extreme lighting conditions. Furthermore, our framework exhibits strong data efficiency, maintaining high accuracy even with substantially reduced training data. The source code and datasets will be publicly available.
