Table of Contents
Fetching ...

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

Varun Varma Thozhiyoor, Shivam Tripathi, Venkatesh Babu Radhakrishnan, Anand Bhattad

TL;DR

This work reveals that contemporary video generators systematically under-accelerate falling objects, effectively exhibiting sub-Earth gravity, and often violate Galileo's principle that equal gravitational acceleration applies regardless of starting height. The authors introduce a unit-free, two-object timing protocol that cancels scale and time-base confounds, providing a robust diagnostic of physical understanding. They show widespread gravity violations across state-of-the-art models and demonstrate that a lightweight LoRA adapter trained on 100 synthetic sequences can substantially correct gravity metrics and generalize to zero-shot two-ball drops, inclined planes, and real-world data (PISA). The findings suggest video models encode latent physical structure but require targeted specialization to act as reliable world models, with implications for physics-guided generation and efficient correction strategies. Overall, the paper establishes a calibration-free framework for probing fundamental physics in video generation and demonstrates that minimal data can meaningfully align models with physical laws.

Abstract

Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio $t_1^2/t_2^2 = h_1/h_2$, a relationship independent of $g$, focal length, and scale. This relative test reveals violations of Galileo's equivalence principle. We then demonstrate that this physical gap can be partially mitigated with targeted specialization. A lightweight low-rank adaptor fine-tuned on only 100 single-ball clips raises $g_{\mathrm{eff}}$ from $1.81\,\mathrm{m/s^2}$ to $6.43\,\mathrm{m/s^2}$ (reaching $65\%$ of terrestrial gravity). This specialist adaptor also generalizes zero-shot to two-ball drops and inclined planes, offering initial evidence that specific physical laws can be corrected with minimal data.

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

TL;DR

This work reveals that contemporary video generators systematically under-accelerate falling objects, effectively exhibiting sub-Earth gravity, and often violate Galileo's principle that equal gravitational acceleration applies regardless of starting height. The authors introduce a unit-free, two-object timing protocol that cancels scale and time-base confounds, providing a robust diagnostic of physical understanding. They show widespread gravity violations across state-of-the-art models and demonstrate that a lightweight LoRA adapter trained on 100 synthetic sequences can substantially correct gravity metrics and generalize to zero-shot two-ball drops, inclined planes, and real-world data (PISA). The findings suggest video models encode latent physical structure but require targeted specialization to act as reliable world models, with implications for physics-guided generation and efficient correction strategies. Overall, the paper establishes a calibration-free framework for probing fundamental physics in video generation and demonstrates that minimal data can meaningfully align models with physical laws.

Abstract

Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio , a relationship independent of , focal length, and scale. This relative test reveals violations of Galileo's equivalence principle. We then demonstrate that this physical gap can be partially mitigated with targeted specialization. A lightweight low-rank adaptor fine-tuned on only 100 single-ball clips raises from to (reaching of terrestrial gravity). This specialist adaptor also generalizes zero-shot to two-ball drops and inclined planes, offering initial evidence that specific physical laws can be corrected with minimal data.

Paper Structure

This paper contains 36 sections, 1 equation, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Video generators produce physically implausible slow-motion falls and most fail to understand that objects fall at equal rates. We visualize two identical balls dropped simultaneously from different heights using stroboscopic (time-lapse) composites, tracking motion until the lower ball impacts the ground. Two failures emerge: (1) Galileo's principle violations: Under Galileo's principle, both balls should fall equal distances in equal time regardless of starting height. Red dashed lines mark the expected position of the higher ball if both fell at equal rates. Earth (leftmost; simulation from Blender) shows the higher ball reaching this expected position, confirming correct physics. Moon reference ($g \approx 1.6\,\mathrm{m/s^2}$) shows both balls falling slower but preserving equal-rate progression. In contrast, Wan 14B and Veo3 show the higher ball severely lagging behind the red line. It traveled far less distance than the lower ball despite falling for the same time, violating the fundamental principle that gravitational acceleration is universal. Even Wan 5B shows noticeable lag. (2) Severe under-acceleration: The spacing between successive ball positions indicates effective acceleration; wider spacing means higher acceleration. Most models exhibit compressed spacing comparable to Moon ($1.6\,\mathrm{m/s^2}$) or Mars ($3.7\,\mathrm{m/s^2}$) rather than Earth's $9.81\,\mathrm{m/s^2}$, revealing motion dramatically slower than terrestrial physics. Our Gravity Adapter (second panel), fine-tuned on Wan 5B with just 100 examples, corrects both failures by bringing the higher ball to the expected position and improving Earth-like spacing.
  • Figure 2: Effect of time-scaling on $h$–$t$ relationships.(a) We plot $h$ versus $t$ for all models. We repeat each test example with 4 seeds and fit polynomials through the means. The gray dashed line indicates terrestrial motion. All models systematically under-accelerate, and none obey the square root scaling law of time with height. The Gravity Adapters (green, gold) substantially improves Wan 5B and Wan 14B towards correct gravity. (b) Mean time scaling. We compute a Mean time scalar using a subset of random 30 samples from our dataset, which scales the effective time of the 30 samples to better match the ground truth time. The Mean time scalar, when applied to the second subset of 45 samples, brings the mean effective gravity closer to 9.81, m/s2 for many models, but the variance remains high indicating that under-acceleration is not simply a frame rate artifact.
  • Figure 3: Scaled Single-ball drops reveal systematic under-acceleration across all models. Stroboscopic composites (left) visualize ball positions at equal time intervals from release. The panels show the trajectories performed by each model during the time it takes a ball falling under $9.8 {m/s^2}$ to reach the ground, scaled by MTS (Tab. \ref{['tab:time_scaled_single_ball_table']}). All models showcase severe under-acceleration (easily visible in the compressed spacing in the composites). The Gravity Adapters (seventh and eighth column) substantially improves Wan 5B and Wan14B toward terrestrial dynamics.
  • Figure 4: Two-ball relative timing results. We plot measured timing ratios $t_1^2/t_2^2$ against theoretical predictions $h_1/h_2$ across multiple height ratios. The gray dashed line indicates perfect agreement. All models deviate systematically, confirming that under-acceleration is not an artifact of scale estimation but reflects genuine physics and Galileo's principle violations. We also measure the slope (m) for each of them to understand the deviation.
  • Figure 5: Most models fail Galileo's principle of gravitational equivalence. We freeze at the moment the lower ball (from height h$_1$) impacts the ground. Under correct physics, both balls should have fallen equal distances in equal time, demonstrating that gravitational acceleration is universal (leftmost column: Ground Truth Earth). Catastrophic failures: Veo3 (sixth column) shows the higher ball barely moving while the lower ball lands—a complete violation of 400-year-old physics. Wan 14B, Veo3, and Cosmos 14B show similar failures with the higher ball remaining significantly elevated, suggesting these models believe in gravity depends on the starting height or object ordering. Correction: The Gravity Adapter (second column) finetuned on Wan 5B restores near-terrestrial acceleration and perfects equal-rate falling, demonstrating that this fundamental physics deficit can be corrected with only 100 training examples.
  • ...and 11 more figures