Event-Enhanced Snapshot Compressive Videography at 10K FPS
Bo Zhang, Jinli Suo, Qionghai Dai
TL;DR
The paper tackles the challenge of ultrafast videography with low data bandwidth by marrying intensity-based snapshot compressive imaging with event-camera information. It introduces a dual-path hardware design and a dual-branch Transformer that jointly leverage coded intensity measurements and asynchronous events to reconstruct dense high-speed frames at 10K FPS, demonstrated on simulated and real data. Key contributions include the compact, photon-efficient dual-path optical setup and the architecture that fuses intensity and event information for both dense-frame reconstruction and timestamp-aware interpolation, outperforming state-of-the-art video SCI and VFI methods. The proposed approach offers a practical pathway to high-throughput, megapixel-rate videography with low-cost sensors, albeit with current limits on real-time processing and reliance on specialized hardware.
Abstract
Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid "intensity+event" imaging scheme by incorporating an event camera into a video SCI setup. Our proposed system consists of a dual-path optical setup to record the coded intensity measurement and intermediate event signals simultaneously, which is compact and photon-efficient by collecting the half photons discarded in conventional video SCI. Correspondingly, we developed a dual-branch Transformer utilizing the reciprocal relationship between two data modes to decode dense video frames. Extensive experiments on both simulated and real-captured data demonstrate our superiority to state-of-the-art video SCI and video frame interpolation (VFI) methods. Benefiting from the new hybrid design leveraging both intrinsic redundancy in videos and the unique feature of event cameras, we achieve high-quality videography at 0.1ms time intervals with a low-cost CMOS image sensor working at 24 FPS.
