Table of Contents
Fetching ...

Visual Physics: Discovering Physical Laws from Videos

Pradyumna Chari, Chinmay Talegaonkar, Yunhao Ba, Achuta Kadambi

TL;DR

The paper tackles the challenge of discovering physical laws from video by proposing Visual Physics, a three-component pipeline that jointly learns governing equations and parameters. It combines a Mask R-CNN based position detector, a beta-VAE–driven latent physics module, and a Eureqa–style genetic programming equation discovery that yields symbolic, interpretable formulas. Demonstrations on synthetic and real 2D motion tasks show symbolically accurate expressions and affine mappings between latent nodes and ground-truth parameters, with robustness to noise and varying data sizes. The work advances unsupervised physics discovery from visual data and provides a publicly released dataset to support future research.

Abstract

In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to discover physical laws on videos of elementary physical phenomena, such as projectile motion or circular motion. These elementary tasks have textbook governing equations and enable ground truth verification of our approach.

Visual Physics: Discovering Physical Laws from Videos

TL;DR

The paper tackles the challenge of discovering physical laws from video by proposing Visual Physics, a three-component pipeline that jointly learns governing equations and parameters. It combines a Mask R-CNN based position detector, a beta-VAE–driven latent physics module, and a Eureqa–style genetic programming equation discovery that yields symbolic, interpretable formulas. Demonstrations on synthetic and real 2D motion tasks show symbolically accurate expressions and affine mappings between latent nodes and ground-truth parameters, with robustness to noise and varying data sizes. The work advances unsupervised physics discovery from visual data and provides a publicly released dataset to support future research.

Abstract

In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to discover physical laws on videos of elementary physical phenomena, such as projectile motion or circular motion. These elementary tasks have textbook governing equations and enable ground truth verification of our approach.

Paper Structure

This paper contains 43 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Discovering physical equations from visual cues without human intervention. Here, we show- how an input video of projectile motion can be processed by our method to recover both the governing equation of motion, as well as two governing parameters of initial velocities (both horizontal and vertical).
  • Figure 2: Previous work huang2018NIPSworkshop (a) requires both a temporal stream of bounding boxes and the physical parameters. (b) Our proposed technique also requires a stream of bounding boxes, but is able to discover latent parameters that correspond to true physical parameters, like velocity or angular frequency.
  • Figure 3: An overview of the proposed Visual Physics framework. We use a number of video clips as inputs to our system. The extracted position information is fed through the physics parameter extractor, which identifies the governing physical parameters for the phenomenon. These are used as inputs to the genetic programming step, in order to identify a human interpretable, closed form expression for the phenomenon.
  • Figure 4: Discovered physical equations from Visual Physics framework, on simulated videos. We show the observed embedding trends and the obtained equations, which are both accurate in fitting to the observations as well as in human interpretable form. Results are shown on three simulated datasets: ball toss, acceleration and circular motion.
  • Figure 5: Evaluating performance on real data, in two conditions. (a) Testing on a set of real data, and training on real data. The videos of several basketball tosses are used as input to the pipeline. The accurate representations and the derived human interpretable equations, governing the real world phenomenon, are shown to emphasize the robustness of the pipeline. In (b), similar approach but the training set is synthetic data. Similar performance is observed, which underscores that the proposed results are not obtained from overfitting.
  • ...and 5 more figures