Table of Contents
Fetching ...

Event-based Continuous Color Video Decompression from Single Frames

Ziyun Wang, Friedhelm Hamann, Kenneth Chaney, Wen Jiang, Guillermo Gallego, Kostas Daniilidis

TL;DR

ContinuityCam tackles the problem of reconstructing high-quality color video from a single static frame and an aligned event stream by combining a continuous trajectory field with a tri-plane neural synthesis backbone. It introduces a continuous-time motion basis to model long-range pixel trajectories and a compact event feature encoding to enable fast frame synthesis at arbitrary times, fused through a multiscale network with Softmax splatting. The method demonstrates state-of-the-art performance on both standard and challenging E2D2 datasets, improving PSNR by up to 3.61 dB and reducing LPIPS by about one-third compared to strong baselines, while benefiting downstream tasks such as AprilTag detection and Gaussian Splatting-based 3D reconstruction. A new single-lens beam splitter facilitates tightly aligned color-image and event data, enabling robust evaluation under varied lighting and motion conditions and offering practical impact for high-speed capture with reduced bandwidth and latency.

Abstract

We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we tackle the problem of event-based continuous color video decompression, pairing single static color frames and event data to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events. Our method only requires an initial image, thus increasing the robustness to sudden motions, light changes, minimizing the prediction latency, and decreasing bandwidth usage. We also introduce a novel single-lens beamsplitter setup that acquires aligned images and events, and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method by benchmarking color frame reconstruction, outperforming the baseline methods by 3.61 dB in PSNR and by 33% decrease in LPIPS, as well as showing superior results on two downstream tasks.

Event-based Continuous Color Video Decompression from Single Frames

TL;DR

ContinuityCam tackles the problem of reconstructing high-quality color video from a single static frame and an aligned event stream by combining a continuous trajectory field with a tri-plane neural synthesis backbone. It introduces a continuous-time motion basis to model long-range pixel trajectories and a compact event feature encoding to enable fast frame synthesis at arbitrary times, fused through a multiscale network with Softmax splatting. The method demonstrates state-of-the-art performance on both standard and challenging E2D2 datasets, improving PSNR by up to 3.61 dB and reducing LPIPS by about one-third compared to strong baselines, while benefiting downstream tasks such as AprilTag detection and Gaussian Splatting-based 3D reconstruction. A new single-lens beam splitter facilitates tightly aligned color-image and event data, enabling robust evaluation under varied lighting and motion conditions and offering practical impact for high-speed capture with reduced bandwidth and latency.

Abstract

We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we tackle the problem of event-based continuous color video decompression, pairing single static color frames and event data to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events. Our method only requires an initial image, thus increasing the robustness to sudden motions, light changes, minimizing the prediction latency, and decreasing bandwidth usage. We also introduce a novel single-lens beamsplitter setup that acquires aligned images and events, and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method by benchmarking color frame reconstruction, outperforming the baseline methods by 3.61 dB in PSNR and by 33% decrease in LPIPS, as well as showing superior results on two downstream tasks.
Paper Structure (29 sections, 9 equations, 11 figures, 5 tables)

This paper contains 29 sections, 9 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Event-based continuous color video decompression uses an initial frame and subsequent events to generate frames. The prediction relies on continuous motion estimation, neural synthesis and image generation modules.
  • Figure 2: For natural camera motions, it is common to have sharp frames followed by blurry frames (shaded blue region above), which prohibits interpolation methods. Our method is able to reconstruct in these scenarios due to the removal of the dependency on the second frame.
  • Figure 3: Overview. The initial frame and the long range event volumes are concatenated forming the network input. Top blue box (\ref{['sec:event_flow']}): A continuous motion network regresses the motion coefficients for generating the point trajectory for every pixel from events and the initial frame. Bottom orange box (\ref{['sec:synth']}): The input is projected to tri-plane features. A lightweight decoder queries the features and synthesizes pixel RGB values. Bottom red box (\ref{['sec:latent_flow']}): Optical flow is compuated between the intial frame and the synthesized latent frame. We compute another set of features and warped images as pyramids. Right green box (\ref{['sec:method:multiscalefusion']}): Finally, the splatted features and images are merged with the synthesized images via a mult-scale fusion network into a high-quality color image prediction.
  • Figure 4: Continuous long-term trajectory output on test sequences of BS-ERGB Tulyakov22cvpr dataset. We show pixel tracks of uniformly initialized features using motion coefficients predicted from events. The network outputs dense tracks (i.e., per-pixel) in a single feedforward pass. Our continuous basis-enabled motion module can decode complex long-range motions up to 1 second.
  • Figure 5: Qualitative Evaluation: We present two qualitative examples from E2D2 and BS-ERGB Tulyakov22cvpr, respectively. Our method, ContinuityCam, demonstrates enhanced accuracy in reconstructing geometry, even with challenging deformable subjects, such as in the "Fire" sequence (d). This improvement is attributed to the effective use of event data. Notably, in low-light conditions, as seen in the "Gnome" sequence (a), our approach markedly reduces motion blur compared to traditional image acquisition methods. While FILM reda2022film generates plausible results, it fails to accurately predict geometry in all examples. DMVFN hu2023cvpr struggles with occlusions, particularly those caused by rotational movements, as evident in the "Gnome" sequence.
  • ...and 6 more figures