Generalizable Implicit Motion Modeling for Video Frame Interpolation
Zujin Guo, Wei Li, Chen Change Loy
TL;DR
This work tackles the challenge of modeling complex spatiotemporal motion for video frame interpolation by introducing Generalizable Implicit Motion Modeling (GIMM). GIMM encodes motion priors from bidirectional flows through a Motion Encoder and forward warping to generate an instance-specific motion latent, which conditions an adaptive coordinate-based network to predict continuous bilateral flows for arbitrary timestamps. The approach can be plugged into existing flow-based VFI pipelines (e.g., AMT) to produce high-quality interpolations, and it achieves state-of-the-art performance on benchmarks for arbitrary-timestep interpolation, including Vimeo90K-derived motion-learning tasks and SNU-FILM-arb/XTest. The results demonstrate that explicit, generalizable implicit motion modeling with input priors yields more accurate and coherent motion representations across diverse videos, with practical implications for slow-motion synthesis, video editing, and compression.
Abstract
Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.
