OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras
Muhammad Rameez Ur Rahman, Jhony H. Giraldo, Indro Spinelli, Stéphane Lathuilière, Fabio Galasso
TL;DR
OVOSE addresses open-vocabulary semantic segmentation for event cameras, where labeled event data are scarce and existing methods are closed-set. It introduces a two-branch architecture with a grayscale-image branch and an event-branch, both initialized from image foundation models, and utilizes synthetic data with knowledge distillation to transfer open-vocabulary capabilities to events. A dissimilarity network reweights the distillation losses to focus on well-reconstructed regions, and a mask generator plus CLIP-style text encoder enable open-set class predictions. Evaluations on DDD17 and DSEC-Semantic show OVOSE surpasses both closed-set event-segmentation baselines and image-based open-vocabulary adaptations, achieving leading mIoU and accuracy. The work demonstrates practical potential for real-world open-vocabulary segmentation in event-based perception.
Abstract
Event cameras, known for low-latency operation and superior performance in challenging lighting conditions, are suitable for sensitive computer vision tasks such as semantic segmentation in autonomous driving. However, challenges arise due to limited event-based data and the absence of large-scale segmentation benchmarks. Current works are confined to closed-set semantic segmentation, limiting their adaptability to other applications. In this paper, we introduce OVOSE, the first Open-Vocabulary Semantic Segmentation algorithm for Event cameras. OVOSE leverages synthetic event data and knowledge distillation from a pre-trained image-based foundation model to an event-based counterpart, effectively preserving spatial context and transferring open-vocabulary semantic segmentation capabilities. We evaluate the performance of OVOSE on two driving semantic segmentation datasets DDD17, and DSEC-Semantic, comparing it with existing conventional image open-vocabulary models adapted for event-based data. Similarly, we compare OVOSE with state-of-the-art methods designed for closed-set settings in unsupervised domain adaptation for event-based semantic segmentation. OVOSE demonstrates superior performance, showcasing its potential for real-world applications. The code is available at https://github.com/ram95d/OVOSE.
