V2CE: Video to Continuous Events Simulator
Zhongyang Zhang, Shuyang Cui, Kaidong Chai, Haowen Yu, Subhasis Dasgupta, Upal Mahbub, Tauhidur Rahman
TL;DR
V2CE tackles the challenge of generating continuous DVS-like event streams from ordinary videos by introducing a two-stage pipeline: Stage1 converts video into motion-aware event voxels using a 3D UNet with a comprehensive loss suite, and Stage2 recovers precise, continuous event timestamps via Local Dynamics-Aware Timestamp Inference (LDATI). The approach is validated on MVSEC, showing superior voxel fidelity compared with baselines and, crucially, a sampling strategy that preserves temporal dynamics and yields near-ground-truth event counts with low timestamp error. The work also introduces new metrics tailored to DVS event characteristics, enabling rigorous evaluation of both voxel-level predictions and continuous-event streams. Collectively, V2CE achieves state-of-the-art performance and real-time throughput, providing a practical path for high-fidelity DVS data generation and pretraining for event-based tasks.
Abstract
Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).
