UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
Yunfan LU, Guoqiang Liang, Yusheng Wang, Lin Wang, Hui Xiong
TL;DR
This work tackles recovering high-frame-rate GS frames from RS blur frames captured under motion by leveraging paired events. It introduces UniINR, a one-stage framework that unifies RS correction, deblurring, and interpolation through a Spatial-Temporal Implicit Encoding (STE) that produces a STR, Exposure Time Embedding (ETE) that injects per-pixel exposure into a temporal tensor, and a Pixel-by-pixel Decoder (PPD) that renders frames from $T$ and $\theta$ via $I=f_d(T,\theta)=f_{mlp}^{\circlearrowright^5}(T\oplus\theta)$. The model is lightweight (0.379M parameters) and achieves 2.83 ms per frame for 31× interpolation, outperforming state-of-the-art methods on simulated and real datasets, with losses guided by RS blur consistency $\mathcal{L}_b$ and GS frame reconstruction $\mathcal{L}_{re}$. Ablation studies confirm the value of learned exposure-time embedding and show improved robustness to complex motion; code availability supports reproducibility and practical deployment in event-guided imaging tasks.
Abstract
Video frames captured by rolling shutter (RS) cameras during fast camera movement frequently exhibit RS distortion and blur simultaneously. Naturally, recovering high-frame-rate global shutter (GS) sharp frames from an RS blur frame must simultaneously consider RS correction, deblur, and frame interpolation. A naive way is to decompose the whole process into separate tasks and cascade existing methods; however, this results in cumulative errors and noticeable artifacts. Event cameras enjoy many advantages, e.g., high temporal resolution, making them potential for our problem. To this end, we propose the first and novel approach, named UniINR, to recover arbitrary frame-rate sharp GS frames from an RS blur frame and paired events. Our key idea is unifying spatial-temporal implicit neural representation (INR) to directly map the position and time coordinates to color values to address the interlocking degradations. Specifically, we introduce spatial-temporal implicit encoding (STE) to convert an RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), we embed the exposure time into STR and decode the embedded features pixel-by-pixel to recover a sharp frame. Our method features a lightweight model with only 0.38M parameters, and it also enjoys high inference efficiency, achieving 2.83ms/frame in 31 times frame interpolation of an RS blur frame. Extensive experiments show that our method significantly outperforms prior methods. Code is available at https://github.com/yunfanLu/UniINR.
