Table of Contents
Fetching ...

Video-Driven Graph Network-Based Simulators

Franciszek Szewczyk, Gilles Louppe, Matthia Sabatelli

TL;DR

The paper addresses inferring physical properties from short videos to drive a Graph Network-based Simulator (GNS) without explicit parameter inputs. It introduces the Video-Driven Graph Network-based Simulator (VDGNS), which combines a Video Encoder that outputs a latent physical encoding $P$ with a GNS that predicts accelerations via a semi-implicit Euler integrator; the model is trained end-to-end by regressing final accelerations while decoupling motion cues from video content. Empirical results on four Taichi-MPM material classes (water, sand, snow, elastic) show that video encodings separate material properties and exhibit a strong linear relationship with predicted motion, with performance close to a Baseline that uses explicit encodings and with robustness to noise. The findings suggest that video-driven encodings can enable realistic, adaptable physics simulations for design, animation, and gaming, and point to future directions toward unsupervised encodings and real-world video data.

Abstract

Lifelike visualizations in design, cinematography, and gaming rely on precise physics simulations, typically requiring extensive computational resources and detailed physical input. This paper presents a method that can infer a system's physical properties from a short video, eliminating the need for explicit parameter input, provided it is close to the training condition. The learned representation is then used within a Graph Network-based Simulator to emulate the trajectories of physical systems. We demonstrate that the video-derived encodings effectively capture the physical properties of the system and showcase a linear dependence between some of the encodings and the system's motion.

Video-Driven Graph Network-Based Simulators

TL;DR

The paper addresses inferring physical properties from short videos to drive a Graph Network-based Simulator (GNS) without explicit parameter inputs. It introduces the Video-Driven Graph Network-based Simulator (VDGNS), which combines a Video Encoder that outputs a latent physical encoding with a GNS that predicts accelerations via a semi-implicit Euler integrator; the model is trained end-to-end by regressing final accelerations while decoupling motion cues from video content. Empirical results on four Taichi-MPM material classes (water, sand, snow, elastic) show that video encodings separate material properties and exhibit a strong linear relationship with predicted motion, with performance close to a Baseline that uses explicit encodings and with robustness to noise. The findings suggest that video-driven encodings can enable realistic, adaptable physics simulations for design, animation, and gaming, and point to future directions toward unsupervised encodings and real-world video data.

Abstract

Lifelike visualizations in design, cinematography, and gaming rely on precise physics simulations, typically requiring extensive computational resources and detailed physical input. This paper presents a method that can infer a system's physical properties from a short video, eliminating the need for explicit parameter input, provided it is close to the training condition. The learned representation is then used within a Graph Network-based Simulator to emulate the trajectories of physical systems. We demonstrate that the video-derived encodings effectively capture the physical properties of the system and showcase a linear dependence between some of the encodings and the system's motion.
Paper Structure (12 sections, 4 figures, 2 tables)

This paper contains 12 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The video encoding $P$ is combined with vertex attributes to form the initial graph $G_0$. Message passing, performed by the Graph Processor, iterates $M$ times. Subsequently, dynamic data is decoded, and particle positions are updated using an integrator.
  • Figure 2: Top-left: Percentage increase in one-step MSE as noise level increases, relative to the MSE at zero noise. Top-middle: Mean Wasserstein distance between particles in true and predicted trajectories. Top-right: Kernel density estimate of the first two principal components of video encodings for each of the four classes. Thick lines represent the 85th percentile density contours. Bottom-left: Silhouette scores of video encodings for videos of varying lengths. Bottom-middle: The $R^2$ score of the linear relation between the video encodings and the predicted accelerations. Bottom-right: one-step MSE for varying angles of friction, showing the interpolation between water and sand.
  • Figure 3: Sample trajectories and corresponding videos for each of the four classes. Time flows right.
  • Figure 4: Comparison of true trajectories and trajectories predicted using our approach in the Fluid experiment. Time flows right.