MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang; Fuchen Long; Zhaofan Qiu; Yingwei Pan; Wu Liu; Ting Yao; Tao Mei

MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei

TL;DR

This paper tackles controllable motion in image-to-video diffusion by overcoming coarse motion and motion-category ambiguity inherent to Gaussian-extended trajectories. It introduces MotionPro, which jointly uses region-wise trajectories sampled from local optical-flow regions and a motion mask derived from flow maps, enabling precise fine-grained motion and robust object-versus-camera motion understanding. The method builds on Stable Video Diffusion with a motion encoder that modulates video latents through adaptive feature modulation, enhanced by LoRA in all attention modules. Evaluations on WebVid-10M and the newly curated MC-Bench demonstrate state-of-the-art performance in both fine-grained and object-level motion control, with improved trajectory alignment and richer motion dynamics. The MC-Bench benchmark further provides a standardized, annotated dataset for evaluating controllable I2V motion, reinforcing the practical impact of region-wise motion conditioning for interactive video generation.

Abstract

Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining movement region, leading to coarse motion control and failing to disentangle object and camera moving. To alleviate these, we present MotionPro, a precise motion controller that novelly leverages region-wise trajectory and motion mask to regulate fine-grained motion synthesis and identify target motion category (i.e., object or camera moving), respectively. Technically, MotionPro first estimates the flow maps on each training video via a tracking model, and then samples the region-wise trajectories to simulate inference scenario. Instead of extending flow through large Gaussian kernels, our region-wise trajectory approach enables more precise control by directly utilizing trajectories within local regions, thereby effectively characterizing fine-grained movements. A motion mask is simultaneously derived from the predicted flow maps to capture the holistic motion dynamics of the movement regions. To pursue natural motion control, MotionPro further strengthens video denoising by incorporating both region-wise trajectories and motion mask through feature modulation. More remarkably, we meticulously construct a benchmark, i.e., MC-Bench, with 1.1K user-annotated image-trajectory pairs, for the evaluation of both fine-grained and object-level I2V motion control. Extensive experiments conducted on WebVid-10M and MC-Bench demonstrate the effectiveness of MotionPro. Please refer to our project page for more results: https://zhw-zhang.github.io/MotionPro-page/.

MotionPro: A Precise Motion Controller for Image-to-Video Generation

TL;DR

Abstract

MotionPro: A Precise Motion Controller for Image-to-Video Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)