Table of Contents
Fetching ...

Framer: Interactive Frame Interpolation

Wen Wang, Qiuyu Wang, Kecheng Zheng, Hao Ouyang, Zhekai Chen, Biao Gong, Hao Chen, Yujun Shen, Chunhua Shen

TL;DR

Framer introduces an interactive frame interpolation framework that lets users steer local motion by constraining trajectories of selected keypoints, addressing ambiguity in transforming start and end frames. It builds on a pre-trained image-to-video diffusion model with end-frame conditioning and a ControlNet-like trajectory branch, and adds an autopilot mode that estimates point trajectories via bi-directional tracking. Extensive experiments across image morphing, time-lapse, cartoon interpolation, and novel view synthesis show Framer achieves superior visual quality and temporal coherence, with user studies favoring its outputs. The work offers practical benefits for creative workflows and provides a foundation for further controllable video synthesis using diffusion models.

Abstract

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. Concretely, besides taking the start and end frames as inputs, our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. Such a design enjoys two clear benefits. First, incorporating human interaction mitigates the issue arising from numerous possibilities of transforming one image to another, and in turn enables finer control of local motions. Second, as the most basic form of interaction, keypoints help establish the correspondence across frames, enhancing the model to handle challenging cases (e.g., objects on the start and end frames are of different shapes and styles). It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and refine the trajectory automatically, to simplify the usage in practice. Extensive experimental results demonstrate the appealing performance of Framer on various applications, such as image morphing, time-lapse video generation, cartoon interpolation, etc. The code, the model, and the interface will be released to facilitate further research.

Framer: Interactive Frame Interpolation

TL;DR

Framer introduces an interactive frame interpolation framework that lets users steer local motion by constraining trajectories of selected keypoints, addressing ambiguity in transforming start and end frames. It builds on a pre-trained image-to-video diffusion model with end-frame conditioning and a ControlNet-like trajectory branch, and adds an autopilot mode that estimates point trajectories via bi-directional tracking. Extensive experiments across image morphing, time-lapse, cartoon interpolation, and novel view synthesis show Framer achieves superior visual quality and temporal coherence, with user studies favoring its outputs. The work offers practical benefits for creative workflows and provides a foundation for further controllable video synthesis using diffusion models.

Abstract

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. Concretely, besides taking the start and end frames as inputs, our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. Such a design enjoys two clear benefits. First, incorporating human interaction mitigates the issue arising from numerous possibilities of transforming one image to another, and in turn enables finer control of local motions. Second, as the most basic form of interaction, keypoints help establish the correspondence across frames, enhancing the model to handle challenging cases (e.g., objects on the start and end frames are of different shapes and styles). It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and refine the trajectory automatically, to simplify the usage in practice. Extensive experimental results demonstrate the appealing performance of Framer on various applications, such as image morphing, time-lapse video generation, cartoon interpolation, etc. The code, the model, and the interface will be released to facilitate further research.

Paper Structure

This paper contains 25 sections, 4 equations, 25 figures, 4 tables.

Figures (25)

  • Figure 1: Showcases produced by our Framer. It facilitates fine-grained customization of local motions and generates varying interpolation results given the same input start and end frame pair (first 3 rows). Moreover, Framer handles challenging cases and can realize smooth image morphing (last 2 rows). The input trajectories are overlayed on the frames.
  • Figure 2: Framer supports (a) a user-interactive mode for customized point trajectories and (b) an "autopilot" mode for video frame interpolation without trajectory inputs. During training, (d) we fine-tune the 3D-UNet of a pre-trained video diffusion model for video frame interpolation. Afterward, (c) we introduce point trajectory control by freezing the 3D-UNet and fine-tuning the controlling branch.
  • Figure 3: Point trajectory estimation. The point trajectory is initialized by interpolating the coordinates of matched keypoints. In each de-noising step, we perform point tracking by finding the nearest neighbor of keypoints in the start and end frames, respectively. Lastly, We check the bi-directional tracking consistency before updating the point coordinate.
  • Figure 4: Qualitative comparison. 'GT" strands for ground truth. For each method, we only present the middle frame of 7 interpolated frames. The full results can be seen in \ref{['fig:app_comparison_1']} and \ref{['fig:app_comparison_2']} in the Appendix.
  • Figure 5: Reults on human preference.
  • ...and 20 more figures