Video Frame Interpolation for Polarization via Swin-Transformer
Feng Huang, Xin Zhang, Yixuan Xu, Xuesong Wang, Xianyu Wu
TL;DR
The paper addresses the challenge of interpolating polarized video frames, where polarization signals vary with viewpoint and traditional VFI methods struggle to preserve polarization cues. It introduces Swin-VFI, a multi-stage, multi-scale Video Swin Transformer that leverages local shifted-cube self-attention to capture long-range spatiotemporal dependencies with reduced computation. A polarization-aware loss, combining intensity and polarization terms, guides the network to recover AoLP and DoLP accurately. Evaluations on polarized datasets PVFI-Mono and PHSPD, as well as conventional VFI benchmarks, show that Swin-VFI achieves superior reconstruction accuracy for intensity and polarization metrics while offering significant parameter and FLOPS reductions, enabling effective SfP and human-shape reconstruction tasks. Future work will extend to color-polarized video interpolation and broader polarization modalities.
Abstract
Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects varies with shooting perspective, focusing solely on estimating pixel displacement is insufficient to accurately reconstruct the intermediate polarization. To tackle these challenges, this study proposes a multi-stage and multi-scale network called Swin-VFI based on the Swin-Transformer and introduces a tailored loss function to facilitate the network's understanding of polarization changes. To ensure the practicality of our proposed method, this study evaluates its interpolated frames in Shape from Polarization (SfP) and Human Shape Reconstruction tasks, comparing them with other state-of-the-art methods such as CAIN, FLAVR, and VFIT. Experimental results demonstrate our approach's superior reconstruction accuracy across all tasks.
