SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features
Song Wang, Zhu Wang, Can Li, Xiaojuan Qi, Hayden Kwok-Hay So
TL;DR
SpikeMOT addresses the challenge of multi-object tracking with event cameras by integrating a spiking neural network-based tracker with a frame-rate detector in a Siamese architecture. It leverages sparse spatiotemporal features from event voxels, powered by SRM neurons, to achieve high-frequency motion tracking while maintaining identities through a detector–tracker–matcher pipeline. The introduction of DSEC-MOT provides a realistic benchmark with severe occlusions and re-identification demands, enabling thorough evaluation with metrics like HOTA, IDF1, and CLEAR. Experimental results on DSEC-MOT and FE240hz show state-of-the-art tracking performance and solid robustness to background event noise, illustrating the practical impact of sparse, temporally-aware representations for event-based MOT.
Abstract
In comparison to conventional RGB cameras, the superior temporal resolution of event cameras allows them to capture rich information between frames, making them prime candidates for object tracking. Yet in practice, despite their theoretical advantages, the body of work on event-based multi-object tracking (MOT) remains in its infancy, especially in real-world settings where events from complex background and camera motion can easily obscure the true target motion. In this work, an event-based multi-object tracker, called SpikeMOT, is presented to address these challenges. SpikeMOT leverages spiking neural networks to extract sparse spatiotemporal features from event streams associated with objects. The resulting spike train representations are used to track the object movement at high frequency, while a simultaneous object detector provides updated spatial information of these objects at an equivalent frame rate. To evaluate the effectiveness of SpikeMOT, we introduce DSEC-MOT, the first large-scale event-based MOT benchmark incorporating fine-grained annotations for objects experiencing severe occlusions, frequent trajectory intersections, and long-term re-identification in real-world contexts. Extensive experiments employing DSEC-MOT and another event-based dataset, named FE240hz, demonstrate SpikeMOT's capability to achieve high tracking accuracy amidst challenging real-world scenarios, advancing the state-of-the-art in event-based multi-object tracking.
