Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

Alexander Becker; Julius Erbach; Dominik Narnhofer; Konrad Schindler

Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

Alexander Becker, Julius Erbach, Dominik Narnhofer, Konrad Schindler

TL;DR

The paper addresses the challenge of continuous space-time video super-resolution by introducing Video Fourier Field (VFF), a finite sum of 3D sinusoids that models a video as hat $V(x,y,t) = \sum_{i=1}^N a_i \sin(\bm{\omega}_i \cdot (x,y,t) + \phi_i)$. A neural encoder with a large spatio-temporal receptive field predicts voxel-wise amplitudes and phases, while a shared set of frequencies enables coherent, warp-free reconstruction. The method supports sampling at arbitrary spatio-temporal coordinates and incorporates a Gaussian PSF for anti-aliasing, yielding competitive or state-of-the-art PSNR/SSIM across multiple benchmarks and tasks (AVSR, VFI, and general C-STVSR) with improved temporal consistency and efficiency. This unified, continuous representation reduces reliance on explicit motion warping, enables long-range temporal context, and demonstrates practical impact for high-quality, flexible video enhancement at arbitrary scales. $V(x,y,t)$ can be sampled efficiently at any resolution, and the approach scales with model size and context to further improve results.$

Abstract

We introduce a novel formulation for continuous space-time video super-resolution. Instead of decoupling the representation of a video sequence into separate spatial and temporal components and relying on brittle, explicit frame warping for motion compensation, we encode video as a continuous, spatio-temporally coherent 3D Video Fourier Field (VFF). That representation offers three key advantages: (1) it enables cheap, flexible sampling at arbitrary locations in space and time; (2) it is able to simultaneously capture fine spatial detail and smooth temporal dynamics; and (3) it offers the possibility to include an analytical, Gaussian point spread function in the sampling to ensure aliasing-free reconstruction at arbitrary scale. The coefficients of the proposed, Fourier-like sinusoidal basis are predicted with a neural encoder with a large spatio-temporal receptive field, conditioned on the low-resolution input video. Through extensive experiments, we show that our joint modeling substantially improves both spatial and temporal super-resolution and sets a new state of the art for multiple benchmarks: across a wide range of upscaling factors, it delivers sharper and temporally more consistent reconstructions than existing baselines, while being computationally more efficient. Project page: https://v3vsr.github.io.

Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

TL;DR

Abstract

Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)