Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Ruicheng Feng, Chongyi Li, Chen Change Loy
TL;DR
This work targets the dual challenges of facial detail fidelity and temporal coherence in video face super-resolution (VFSR). It introduces KEEP, a Kalman-inspired feature propagation framework that maintains a latent face prior over time by recurrently updating a latent state $z_t$ with information from previously restored frames, guided by a learned Kalman gain network. The method formulates a state-space model in latent space, using a CodeFormer-based generative backbone and a Kalman Filter Network to fuse predictive and observed information, with local temporal consistency enforced via cross-frame attention. Empirical results on VFHQ show that KEEP improves both fidelity (PSNR/SSIM/LPIPS) and temporal stability (IDS/AKD) compared to frame-by-frame image-based SR and standard VSR baselines, including robustness to severe degradations and non-frontal views; code and video demos are provided.
Abstract
Despite the promising progress of face image super-resolution, video face super-resolution remains relatively under-explored. Existing approaches either adapt general video super-resolution networks to face datasets or apply established face image super-resolution models independently on individual video frames. These paradigms encounter challenges either in reconstructing facial details or maintaining temporal consistency. To address these issues, we introduce a novel framework called Kalman-inspired Feature Propagation (KEEP), designed to maintain a stable face prior over time. The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame. Extensive experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames. Code and video demo are available at https://jnjaby.github.io/projects/KEEP.
