Event-based Continuous Color Video Decompression from Single Frames
Ziyun Wang, Friedhelm Hamann, Kenneth Chaney, Wen Jiang, Guillermo Gallego, Kostas Daniilidis
TL;DR
ContinuityCam tackles the problem of reconstructing high-quality color video from a single static frame and an aligned event stream by combining a continuous trajectory field with a tri-plane neural synthesis backbone. It introduces a continuous-time motion basis to model long-range pixel trajectories and a compact event feature encoding to enable fast frame synthesis at arbitrary times, fused through a multiscale network with Softmax splatting. The method demonstrates state-of-the-art performance on both standard and challenging E2D2 datasets, improving PSNR by up to 3.61 dB and reducing LPIPS by about one-third compared to strong baselines, while benefiting downstream tasks such as AprilTag detection and Gaussian Splatting-based 3D reconstruction. A new single-lens beam splitter facilitates tightly aligned color-image and event data, enabling robust evaluation under varied lighting and motion conditions and offering practical impact for high-speed capture with reduced bandwidth and latency.
Abstract
We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we tackle the problem of event-based continuous color video decompression, pairing single static color frames and event data to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events. Our method only requires an initial image, thus increasing the robustness to sudden motions, light changes, minimizing the prediction latency, and decreasing bandwidth usage. We also introduce a novel single-lens beamsplitter setup that acquires aligned images and events, and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method by benchmarking color frame reconstruction, outperforming the baseline methods by 3.61 dB in PSNR and by 33% decrease in LPIPS, as well as showing superior results on two downstream tasks.
