Table of Contents
Fetching ...

BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions

Wonyong Seo, Jihyong Oh, Munchurl Kim

TL;DR

This paper tackles time-to-location (TTL) ambiguity in video frame interpolation for non-uniform motions, which often yields blurry results. It introduces Bidirectional Motion Field (BiM) as a motion descriptor and couples it with a BiM-guided FlowNet (BiMFN) and Content-Aware Upsampling Network (CAUN), all trained with Knowledge Distillation for VFI-Centric Flow supervision (KDVCF) to constrain learning toward VFI-relevant motion. The BiM descriptor captures both magnitude ratios and directional differences between bidirectional flows, enabling more precise motion estimation and reduced blur, while KDVCF aligns the training objective with VFI goals. Empirical results show substantial perceptual gains over state-of-the-art methods on arbitrary-time interpolation benchmarks, with notable improvements in LPIPS and STLPIPS, and ablations validate the contributions of BiM, CAUN, and KDVCF. The approach highlights a practical path to sharper interpolations under uniform-motion inference while acknowledging TTL-related limitations inherent to inference.

Abstract

Existing Video Frame interpolation (VFI) models tend to suffer from time-to-location ambiguity when trained with video of non-uniform motions, such as accelerating, decelerating, and changing directions, which often yield blurred interpolated frames. In this paper, we propose (i) a novel motion description map, Bidirectional Motion field (BiM), to effectively describe non-uniform motions; (ii) a BiM-guided Flow Net (BiMFN) with Content-Aware Upsampling Network (CAUN) for precise optical flow estimation; and (iii) Knowledge Distillation for VFI-centric Flow supervision (KDVCF) to supervise the motion estimation of VFI model with VFI-centric teacher flows. The proposed VFI is called a Bidirectional Motion field-guided VFI (BiM-VFI) model. Extensive experiments show that our BiM-VFI model significantly surpasses the recent state-of-the-art VFI methods by 26% and 45% improvements in LPIPS and STLPIPS respectively, yielding interpolated frames with much fewer blurs at arbitrary time instances.

BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions

TL;DR

This paper tackles time-to-location (TTL) ambiguity in video frame interpolation for non-uniform motions, which often yields blurry results. It introduces Bidirectional Motion Field (BiM) as a motion descriptor and couples it with a BiM-guided FlowNet (BiMFN) and Content-Aware Upsampling Network (CAUN), all trained with Knowledge Distillation for VFI-Centric Flow supervision (KDVCF) to constrain learning toward VFI-relevant motion. The BiM descriptor captures both magnitude ratios and directional differences between bidirectional flows, enabling more precise motion estimation and reduced blur, while KDVCF aligns the training objective with VFI goals. Empirical results show substantial perceptual gains over state-of-the-art methods on arbitrary-time interpolation benchmarks, with notable improvements in LPIPS and STLPIPS, and ablations validate the contributions of BiM, CAUN, and KDVCF. The approach highlights a practical path to sharper interpolations under uniform-motion inference while acknowledging TTL-related limitations inherent to inference.

Abstract

Existing Video Frame interpolation (VFI) models tend to suffer from time-to-location ambiguity when trained with video of non-uniform motions, such as accelerating, decelerating, and changing directions, which often yield blurred interpolated frames. In this paper, we propose (i) a novel motion description map, Bidirectional Motion field (BiM), to effectively describe non-uniform motions; (ii) a BiM-guided Flow Net (BiMFN) with Content-Aware Upsampling Network (CAUN) for precise optical flow estimation; and (iii) Knowledge Distillation for VFI-centric Flow supervision (KDVCF) to supervise the motion estimation of VFI model with VFI-centric teacher flows. The proposed VFI is called a Bidirectional Motion field-guided VFI (BiM-VFI) model. Extensive experiments show that our BiM-VFI model significantly surpasses the recent state-of-the-art VFI methods by 26% and 45% improvements in LPIPS and STLPIPS respectively, yielding interpolated frames with much fewer blurs at arbitrary time instances.

Paper Structure

This paper contains 30 sections, 1 theorem, 11 equations, 15 figures, 5 tables.

Key Result

Theorem 1

Let $A$ and $B$ be two fixed points, and let $k$ be a positive real number. The point $X$ such that the distance ratio $\frac{\overline{AX}}{\overline{BX}} = k$ and the angle $\angle AXB = \theta$ is unique.

Figures (15)

  • Figure 1: Qualitative comparison of our proposed BiM-VFI and SOTA models at arbitrary time instances ($t$ = 0.25, 0.5 and 0.75) for video frame interpolation. The previous SOTA methods yield blurry interpolated frames while our BiM-VFI model generates clear ones.
  • Figure 2: Time-to-location (TTL) ambiguity comparison of motion descriptors (time indexing, distance indexing, our BiM).
  • Figure 3: Our Bidirectional Motion Field-guided VFI (BiM-VFI) with Knowledge Distillation for VFI-Centric Flow supervision (KDVCF).
  • Figure 4: Proposed BiM-Guided FlowNet (BiMFN) at l-th pyramid level.
  • Figure 5: Qualitative comparison for fixed-time interpolation datasets, XTest sim2021xvfi.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof