Table of Contents
Fetching ...

Pretrained Video Models as Differentiable Physics Simulators for Urban Wind Flows

Janne Perini, Rafael Bischof, Moab Arar, Ayça Duran, Michael A. Kraus, Siddhartha Mishra, Bernd Bickel

Abstract

Designing urban spaces that provide pedestrian wind comfort and safety requires time-resolved Computational Fluid Dynamics (CFD) simulations, but their current computational cost makes extensive design exploration impractical. We introduce WinDiNet (Wind Diffusion Network), a pretrained video diffusion model that is repurposed as a fast, differentiable surrogate for this task. Starting from LTX-Video, a 2B-parameter latent video transformer, we fine-tune on 10,000 2D incompressible CFD simulations over procedurally generated building layouts. A systematic study of training regimes, conditioning mechanisms, and VAE adaptation strategies, including a physics-informed decoder loss, identifies a configuration that outperforms purpose-built neural PDE solvers. The resulting model generates full 112-frame rollouts in under a second. As the surrogate is end-to-end differentiable, it doubles as a physics simulator for gradient-based inverse optimization: given an urban footprint layout, we optimize building positions directly through backpropagation to improve wind safety as well as pedestrian wind comfort. Experiments on single- and multi-inlet layouts show that the optimizer discovers effective layouts even under challenging multi-objective configurations, with all improvements confirmed by ground-truth CFD simulations.

Pretrained Video Models as Differentiable Physics Simulators for Urban Wind Flows

Abstract

Designing urban spaces that provide pedestrian wind comfort and safety requires time-resolved Computational Fluid Dynamics (CFD) simulations, but their current computational cost makes extensive design exploration impractical. We introduce WinDiNet (Wind Diffusion Network), a pretrained video diffusion model that is repurposed as a fast, differentiable surrogate for this task. Starting from LTX-Video, a 2B-parameter latent video transformer, we fine-tune on 10,000 2D incompressible CFD simulations over procedurally generated building layouts. A systematic study of training regimes, conditioning mechanisms, and VAE adaptation strategies, including a physics-informed decoder loss, identifies a configuration that outperforms purpose-built neural PDE solvers. The resulting model generates full 112-frame rollouts in under a second. As the surrogate is end-to-end differentiable, it doubles as a physics simulator for gradient-based inverse optimization: given an urban footprint layout, we optimize building positions directly through backpropagation to improve wind safety as well as pedestrian wind comfort. Experiments on single- and multi-inlet layouts show that the optimizer discovers effective layouts even under challenging multi-objective configurations, with all improvements confirmed by ground-truth CFD simulations.
Paper Structure (53 sections, 15 equations, 21 figures, 8 tables)

This paper contains 53 sections, 15 equations, 21 figures, 8 tables.

Figures (21)

  • Figure 1: Overview of the proposed framework. (a) Procedurally generated urban layouts are simulated with a 2D incompressible Euler solver to produce training data. (b) A latent diffusion model with a physics-informed VAE is trained to generate wind field sequences conditioned on building footprint, inlet speed $u_\mathrm{in}$, and domain size $L$. (c) At inference, the model generates horizontal and vertical velocity fields $(u, v)$ and enables gradient-based inverse optimization of building layouts.
  • Figure 2: Channel decomposition of a single simulation frame. From left to right: encoded RGB composite, red channel encoding horizontal velocity $u$, green channel encoding vertical velocity $v$, and blue channel encoding the fluid mask ($1$: fluid, $0$: building). Velocity values are linearly rescaled to $[-1, 1]$ using the dataset-wide maximum speed $u_{\max}$.
  • Figure 3: Representative samples from the training dataset. Each tile shows the RGB-encoded velocity field at frame 100 for a distinct simulation.
  • Figure 4: Wind speed magnitude predicted by Dec. FT Physics for a procedurally generated urban layout at $15\,\mathrm{m/s}$ inlet velocity. Ground truth (left) and model prediction (right) at timesteps $t\!=\!0$, $56$, and $112$.
  • Figure 5: VAE reconstruction quality at $t{=}90$ for a sample from the test set.
  • ...and 16 more figures