BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
Wonyong Seo, Jihyong Oh, Munchurl Kim
TL;DR
This paper tackles time-to-location (TTL) ambiguity in video frame interpolation for non-uniform motions, which often yields blurry results. It introduces Bidirectional Motion Field (BiM) as a motion descriptor and couples it with a BiM-guided FlowNet (BiMFN) and Content-Aware Upsampling Network (CAUN), all trained with Knowledge Distillation for VFI-Centric Flow supervision (KDVCF) to constrain learning toward VFI-relevant motion. The BiM descriptor captures both magnitude ratios and directional differences between bidirectional flows, enabling more precise motion estimation and reduced blur, while KDVCF aligns the training objective with VFI goals. Empirical results show substantial perceptual gains over state-of-the-art methods on arbitrary-time interpolation benchmarks, with notable improvements in LPIPS and STLPIPS, and ablations validate the contributions of BiM, CAUN, and KDVCF. The approach highlights a practical path to sharper interpolations under uniform-motion inference while acknowledging TTL-related limitations inherent to inference.
Abstract
Existing Video Frame interpolation (VFI) models tend to suffer from time-to-location ambiguity when trained with video of non-uniform motions, such as accelerating, decelerating, and changing directions, which often yield blurred interpolated frames. In this paper, we propose (i) a novel motion description map, Bidirectional Motion field (BiM), to effectively describe non-uniform motions; (ii) a BiM-guided Flow Net (BiMFN) with Content-Aware Upsampling Network (CAUN) for precise optical flow estimation; and (iii) Knowledge Distillation for VFI-centric Flow supervision (KDVCF) to supervise the motion estimation of VFI model with VFI-centric teacher flows. The proposed VFI is called a Bidirectional Motion field-guided VFI (BiM-VFI) model. Extensive experiments show that our BiM-VFI model significantly surpasses the recent state-of-the-art VFI methods by 26% and 45% improvements in LPIPS and STLPIPS respectively, yielding interpolated frames with much fewer blurs at arbitrary time instances.
