Table of Contents
Fetching ...

PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?

Martin Spitznagel, Jan Vaillant, Janis Keuper

TL;DR

PhysicsGen investigates whether modern generative image models can learn complex physical relations from input-output image pairs. The authors release a 300k-image-pair benchmark across three physical tasks and assess speedups versus ground-truth PDE-based simulations using architectures including GANs, U-Net, VAEs, and diffusion models. They find notable runtime speedups (up to $2\times 10^4$) for simple 0th- and 1st-order dynamics but observe substantial gaps in accuracy for higher-order terms, underscoring the need for physics-informed losses. The work provides a scalable benchmark and dataset to guide development of neural-enhanced physical simulations.

Abstract

The image-to-image translation abilities of generative learning models have recently made significant progress in the estimation of complex (steered) mappings between image distributions. While appearance based tasks like image in-painting or style transfer have been studied at length, we propose to investigate the potential of generative models in the context of physical simulations. Providing a dataset of 300k image-pairs and baseline evaluations for three different physical simulation tasks, we propose a benchmark to investigate the following research questions: i) are generative models able to learn complex physical relations from input-output image pairs? ii) what speedups can be achieved by replacing differential equation based simulations? While baseline evaluations of different current models show the potential for high speedups (ii), these results also show strong limitations toward the physical correctness (i). This underlines the need for new methods to enforce physical correctness. Data, baseline models and evaluation code http://www.physics-gen.org.

PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?

TL;DR

PhysicsGen investigates whether modern generative image models can learn complex physical relations from input-output image pairs. The authors release a 300k-image-pair benchmark across three physical tasks and assess speedups versus ground-truth PDE-based simulations using architectures including GANs, U-Net, VAEs, and diffusion models. They find notable runtime speedups (up to ) for simple 0th- and 1st-order dynamics but observe substantial gaps in accuracy for higher-order terms, underscoring the need for physics-informed losses. The work provides a scalable benchmark and dataset to guide development of neural-enhanced physical simulations.

Abstract

The image-to-image translation abilities of generative learning models have recently made significant progress in the estimation of complex (steered) mappings between image distributions. While appearance based tasks like image in-painting or style transfer have been studied at length, we propose to investigate the potential of generative models in the context of physical simulations. Providing a dataset of 300k image-pairs and baseline evaluations for three different physical simulation tasks, we propose a benchmark to investigate the following research questions: i) are generative models able to learn complex physical relations from input-output image pairs? ii) what speedups can be achieved by replacing differential equation based simulations? While baseline evaluations of different current models show the potential for high speedups (ii), these results also show strong limitations toward the physical correctness (i). This underlines the need for new methods to enforce physical correctness. Data, baseline models and evaluation code http://www.physics-gen.org.

Paper Structure

This paper contains 39 sections, 22 equations, 37 figures, 14 tables.

Figures (37)

  • Figure 1: Overview of the physical problems, baseline generative models and their evaluation. A: We introduce three complex physical simulation tasks with 100k input-output image pairs each, providing ground truth simulations based on differential equations with varying complexity. B: We evaluate all tasks on independently trained image translation models; only results for the sound propagation task are visualized in this figure. C: While the evaluation of the baseline models shows a general ability of generative image models to learn physical relations from images, we observe significant performance drops for tasks that require a higher order term in the differential equations of their simulation.
  • Figure 2: The sampling pipeline for the sound propagation dataset utilizes the NoiseModelling framework noisemodelling_framework to generate sound propagation maps based on specific urban layouts. The generators are then trained to replicate these sound propagation patterns for given locations and source parameters. Predictions are evaluated by specifically analyzing errors in relation to the line of sight (see \ref{['app:sound_eval_metrics']} for details).
  • Figure 3: Qualitative results comparing the ground-truth simulation with the prediction for a single sample within the reflection task. Additional results can be found in \ref{['app:results']}
  • Figure 4: Sampling and evaluation pipeline for the lens distortion dataset. The Brown-Conrady distortion model generates the true distorted images based on parameters $p_1$ and $p_2$ (we depict a chess pattern for visualization). The conditioned generators are then trained to replicate these distortions for given images and parameters. The models are evaluated by comparing predicted against true facial landmarks, using a 2D facial landmark detection based on the Facial Alignment Network (FAN) Bulat_2017.
  • Figure 5: Qualitative visualization of lens distortion predictions using different generative models on a CelebA dataset sample. $p_2$ distortion: Original image at top-left; subsequent panels show U-Net, Pix2Pix, Diffusion models. Red dots mark actual landmark positions and blue dots for predictions.
  • ...and 32 more figures