BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, Jaejun Yoo
TL;DR
This work tackles Continuous Spatial-Temporal Video Super-Resolution (C-STVSR) by addressing spectral bias and the limitations of coordinate-based INR encodings and pre-trained optical-flow networks. It introduces BF-STVSR, a flow-free framework comprising two axis-specific modules: Temporal B-spline Mapper for smooth motion interpolation and Spatial Fourier Mapper for capturing dominant spatial frequencies, enabling arbitrary time $t \in [0,1]$ and scale $s$ without external RAFT guidance. The method achieves state-of-the-art PSNR/SSIM and video-quality metrics across standard benchmarks, while reducing computational cost through learned motion from encoded features and forward warping, even without optical-flow supervision. This demonstrates that axis-specific, frequency-aware representations can robustly model spatio-temporal video structure for continuous interpolation with practical efficiency.
Abstract
While prior methods in Continuous Spatial-Temporal Video Super-Resolution (C-STVSR) employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow networks for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve--and even degrades--performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency. Our code is available https://github.com/Eunjnnn/bfstvsr.
