HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera
Yunfan Lu, Yusheng Wang, Zipeng Wang, Pengteng Li, Bin Yang, Hui Xiong
TL;DR
HR-INR leverages high-temporal-resolution event data to enable continuous space-time video super-resolution. It introduces Temporal Pyramid Representation to capture regional fast motion and combines regional and holistic feature extraction with an INR-based spatiotemporal decoder that uses temporal and spatial embeddings for arbitrary-scale outputs. Across four datasets, HR-INR achieves state-of-the-art performance and superior temporal stability compared with both frame-based and prior event-guided methods, while maintaining compact model size and efficient inference. The approach significantly advances practical video enhancement in dynamic scenes and opens avenues for broader event-driven video processing tasks.
Abstract
Continuous space-time video super-resolution (C-STVSR) aims to simultaneously enhance video resolution and frame rate at an arbitrary scale. Recently, implicit neural representation (INR) has been applied to video restoration, representing videos as implicit fields that can be decoded at an arbitrary scale. However, existing INR-based C-STVSR methods typically rely on only two frames as input, leading to insufficient inter-frame motion information. Consequently, they struggle to capture fast, complex motion and long-term dependencies (spanning more than three frames), hindering their performance in dynamic scenes. In this paper, we propose a novel C-STVSR framework, named HR-INR, which captures both holistic dependencies and regional motions based on INR. It is assisted by an event camera -- a novel sensor renowned for its high temporal resolution and low latency. To fully utilize the rich temporal information from events, we design a feature extraction consisting of (1) a regional event feature extractor -- taking events as inputs via the proposed event temporal pyramid representation to capture the regional nonlinear motion and (2) a holistic event-frame feature extractor for long-term dependence and continuity motion. We then propose a novel INR-based decoder with spatiotemporal embeddings to capture long-term dependencies with a larger temporal perception field. We validate the effectiveness and generalization of our method on four datasets (both simulated and real data), showing the superiority of our method. The project page is available at https://github.com/yunfanLu/HR-INR
