What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Minh-Quan Le; Yuanzhi Zhu; Vicky Kalogeiton; Dimitris Samaras

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras

TL;DR

This work tackles the mismatch between visual realism and physical realism in video generation by introducing NewtonRewards, a post-training framework that uses verifiable rewards derived from measurable proxies (optical flow for velocity and visual features for mass) to enforce Newtonian dynamics. It defines a kinematic constraint of constant image-plane acceleration and a mass-conservation constraint, combining them into a post-training objective that guides diffusion-based video generators. Through the NewtonBench-60K benchmark across five Newtonian Motion Primitives, NewtonRewards achieves consistent improvements in physical plausibility, motion smoothness, and temporal coherence, with strong ID and OOD generalization. The results suggest that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation and point to a general framework for enforcing other physical laws via proxy-based, differentiable constraints.

Abstract

Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose $\texttt{NewtonRewards}$, the first physics-grounded post-training framework for video generation based on $\textit{verifiable rewards}$. Instead of relying on human or VLM feedback, $\texttt{NewtonRewards}$ extracts $\textit{measurable proxies}$ from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate $\texttt{NewtonRewards}$ on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, $\texttt{NewtonBench-60K}$. Across all primitives in visual and physics metrics, $\texttt{NewtonRewards}$ consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

TL;DR

Abstract

, the first physics-grounded post-training framework for video generation based on

. Instead of relying on human or VLM feedback,

extracts

from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate

on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark,

. Across all primitives in visual and physics metrics,

consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

TL;DR

Abstract

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)