TimeRewind: Rewinding Time with Image-and-Events Video Diffusion
Jingxi Chen, Brandon Y. Feng, Haoming Cai, Mingyang Xie, Christopher Metzler, Cornelia Fermuller, Yiannis Aloimonos
TL;DR
This work tackles the ill-posed problem of recovering pre-capture motion from a single image by leveraging neuromorphic event cameras to provide motion cues. It introduces TimeRewind, a framework that freezes a pre-trained Img2Vid diffusion model and augments it with an Event Motion Adaptor (EMA) conditioned on event data to synthesize backward-time videos that are physically grounded. Through extensive experiments on the RGB-Event BS-ERGB dataset, TimeRewind achieves higher perceptual and fidelity metrics (e.g., PSNR, SSIM, LPIPS) than baselines and RGB-Event backbones, demonstrating robust backward-time video synthesis and improved motion realism. The approach offers practical insights for future consumer cameras and smartphones and opens new research directions at the convergence of event sensing and generative video modeling.
Abstract
This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degree of freedom in potential pixel movements. We overcome this challenge by leveraging the emerging technology of neuromorphic event cameras, which capture motion information with high temporal resolution, and integrating this data with advanced image-to-video diffusion models. Our proposed framework introduces an event motion adaptor conditioned on event camera data, guiding the diffusion model to generate videos that are visually coherent and physically grounded in the captured events. Through extensive experimentation, we demonstrate the capability of our approach to synthesize high-quality videos that effectively ``rewind'' time, showcasing the potential of combining event camera technology with generative models. Our work opens new avenues for research at the intersection of computer vision, computational photography, and generative modeling, offering a forward-thinking solution to capturing missed moments and enhancing future consumer cameras and smartphones. Please see the project page at https://timerewind.github.io/ for video results and code release.
