Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
Jiahe Liu, Youran Qu, Qi Yan, Xiaohui Zeng, Lele Wang, Renjie Liao
TL;DR
This work introduces FVMD, a motion-focused metric for evaluating generated videos by tracking key point trajectories to extract velocity and acceleration features and comparing their distributions to ground-truth videos via the Fréchet distance. The method is validated through sanity checks, sensitivity analyses, and large-scale human studies, showing stronger alignment with human judgments than existing metrics like FVD, FID-VID, and VBench. Additionally, incorporating the motion features improves unary video quality assessment (VQA) models, suggesting broad applicability beyond pairwise video evaluation. The results indicate that FVMD provides a more faithful measure of temporal motion quality and has practical implications for improving video generation and evaluation pipelines.
Abstract
Significant advancements have been made in video generative models recently. Unlike image generation, video generation presents greater challenges, requiring not only generating high-quality frames but also ensuring temporal consistency across these frames. Despite the impressive progress, research on metrics for evaluating the quality of generated videos, especially concerning temporal and motion consistency, remains underexplored. To bridge this research gap, we propose Fréchet Video Motion Distance (FVMD) metric, which focuses on evaluating motion consistency in video generation. Specifically, we design explicit motion features based on key point tracking, and then measure the similarity between these features via the Fréchet distance. We conduct sensitivity analysis by injecting noise into real videos to verify the effectiveness of FVMD. Further, we carry out a large-scale human study, demonstrating that our metric effectively detects temporal noise and aligns better with human perceptions of generated video quality than existing metrics. Additionally, our motion features can consistently improve the performance of Video Quality Assessment (VQA) models, indicating that our approach is also applicable to unary video quality evaluation. Code is available at https://github.com/ljh0v0/FMD-frechet-motion-distance.
