Table of Contents
Fetching ...

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

Xiangbo Gao, Mingyang Wu, Siyuan Yang, Jiongze Yu, Pardis Taghavi, Fangzhou Lin, Zhengzhong Tu

Abstract

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

Abstract

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.
Paper Structure (30 sections, 5 equations, 6 figures, 3 tables)

This paper contains 30 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Visualization of Chronometric Hallucination. Current video generators sometimes fail to ground their outputs in a consistent physical time scale, even when no speed-manipulating keywords (e.g., "slow motion") are prompted. (a) A hummingbird hawk-moth is rendered in extreme slow-motion, despite its naturally high wing-beat frequency. (b) A person falls onto a bed at a velocity significantly slower than standard gravity. These instances illustrate Chronometric Hallucination: a prevalent failure mode where generated motions exhibit an ambiguous, unstable, and uncontrollable physical time scale.
  • Figure 2: Physics-Grounded Temporal Augmentation. We synthesize diverse low-rate videos from high-frequency source data (240 FPS) to simulate real-world camera mechanics: Sharp Capture, Motion Blur, and Rolling Shutter.
  • Figure 3: Dataset distribution across 18 target Physical Frame Rates.
  • Figure 4: Human Perceptual Preference on Temporal Naturalness. Bradley-Terry scores comparing the original generated videos against our post-processed variants. Both the global average correction (Pred) and dynamic local correction (Pred Dyn) are strongly preferred over the hallucinated original outputs, with 90% confidence intervals indicating statistical significance.
  • Figure 5: Continuous PhyFPS Prediction on Real Dynamics. Qualitative results from our Visual Chronometer evaluating a single dynamic action (soccer ball juggling) captured at three distinct physical frame rates (60, 24, and 12 PhyFPS). The model not only accurately recovers the absolute time scale directly from visual cues but also maintains remarkable temporal stability across the entire sequence.
  • ...and 1 more figures