MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice
Friedhelm Hamann, Hanxiong Li, Paul Mieske, Lars Lewejohann, Guillermo Gallego
TL;DR
MouseSIS introduces space-time instance segmentation (SIS) for event-based data by providing the first public dataset with pixel-accurate masks for up to seven mice, using aligned grayscale frames and events captured via a beamsplitter system. The work presents two baseline approaches—ModelMixSort (tracking-by-detection) and EventSeqFormer (tracking-by-query transformer)—to benchmark SIS with both modalities and their combination. Experimental results show that incorporating event data can improve tracking performance, though challenges remain in low-contrast, high-noise sequences and in integrating modalities end-to-end. The dataset (33 sequences, ~640 seconds total, ~75,000 masks) offers a valuable resource for developing robust, high-time-resolution tracking under difficult conditions and encourages broader application in biology and neuroscience. Overall, MouseSIS advances event-based scene understanding by enabling fine-grained, mask-level tracking across time, informing future development of space-time tracking algorithms.
Abstract
Enabled by large annotated datasets, tracking and segmentation of objects in videos has made remarkable progress in recent years. Despite these advancements, algorithms still struggle under degraded conditions and during fast movements. Event cameras are novel sensors with high temporal resolution and high dynamic range that offer promising advantages to address these challenges. However, annotated data for developing learning-based mask-level tracking algorithms with events is not available. To this end, we introduce: ($i$) a new task termed \emph{space-time instance segmentation}, similar to video instance segmentation, whose goal is to segment instances throughout the entire duration of the sensor input (here, the input are quasi-continuous events and optionally aligned frames); and ($ii$) \emph{\dname}, a dataset for the new task, containing aligned grayscale frames and events. It includes annotated ground-truth labels (pixel-level instance segmentation masks) of a group of up to seven freely moving and interacting mice. We also provide two reference methods, which show that leveraging event data can consistently improve tracking performance, especially when used in combination with conventional cameras. The results highlight the potential of event-aided tracking in difficult scenarios. We hope our dataset opens the field of event-based video instance segmentation and enables the development of robust tracking algorithms for challenging conditions.\url{https://github.com/tub-rip/MouseSIS}
