Table of Contents
Fetching ...

PhysiX: A Foundation Model for Physics Simulations

Tung Nguyen, Arsh Koneru, Shufan Li, Aditya Grover

TL;DR

Physics simulations suffer from data scarcity and domain heterogeneity, hindering scaling of foundation models. PhysiX introduces a 4.5B autoregressive transformer with a universal discrete tokenizer and a refinement module, pretrained with natural video priors and fine-tuned across eight physics tasks. It achieves superior next-frame and long-horizon predictions, outperforming task-specific baselines on The Well benchmark and demonstrating effective transfer to unseen simulations. This work suggests that foundation-model-style multitask learning can yield general-purpose, scalable surrogates for diverse physical systems.

Abstract

Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.

PhysiX: A Foundation Model for Physics Simulations

TL;DR

Physics simulations suffer from data scarcity and domain heterogeneity, hindering scaling of foundation models. PhysiX introduces a 4.5B autoregressive transformer with a universal discrete tokenizer and a refinement module, pretrained with natural video priors and fine-tuned across eight physics tasks. It achieves superior next-frame and long-horizon predictions, outperforming task-specific baselines on The Well benchmark and demonstrating effective transfer to unseen simulations. This work suggests that foundation-model-style multitask learning can yield general-purpose, scalable surrogates for diverse physical systems.

Abstract

Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.

Paper Structure

This paper contains 24 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: We propose PhysiX, a foundation model pretrained for physics simulations. We train PhysiX over a collection of 8 physics simulation tasks of the Well benchmark, resulting in a multi-task model that outperforms previous single-task baselines. We report VRMSE (lower is better) averaged across different physical properties and lead time between 9-26 frames for each task.
  • Figure 2: The overall design of PhysiX. PhysiX consists of a video tokenizer, an autoregressive model, and a refinement network. Given input frames $x_1,\dots,x_N$, the tokenizer discretizes each frame into a sequence of discrete tokens, where the $j$th token of frame $i$ is denoted as $\{z_i^j\}$. The autoregressive model then generates predictions in this discrete token space, which are converted back to pixel-level predictions $\hat{x}$ by the de-tokenizer. A refinement module is incorporated to mitigate artifacts caused by the discretization error, such as blocky, pixelated outputs (visualized in yellow boxes), and produce the final sharper and more detailed output $\hat{y}$.
  • Figure 3: Long-horizon prediction performance. We visualize VRMSE (lower is better) across different lead time on shear_flow,active_matter, and turbulent_radiative_layer datasets.
  • Figure 4: Effect of refinement module. We apply refinement module to both the multi-task and single-task AR model and study its effect on predication errors. We report VRMSE and MSE (lower is better) over prediction windows ranging from 1 frame to 8 frames on the gray_scott_reaction_diffusion dataset.
  • Figure 5: Side-by-side qualitative comparison of PhysiX and baseline models. PhysiX demonstrates superior performance in long horizon rollouts than the leading baseline model. At lead times of 24 and 15 steps for shear flow and Rayleigh–Bénard convection respectively, PhysiX maintains high-fidelity predictions across all physical fields, while baseline models ConvNeXt-UNet and TFNO exhibit visible distortions and loss of detail.
  • ...and 5 more figures