Table of Contents
Fetching ...

Towards Reasoning for PDE Foundation Models: A Reward-Model-Driven Inference-Time-Scaling Algorithm

Siddharth Mansingh, James Amarel, Ragib Arnab, Arvind Mohan, Kamaljeet Singh, Gerd J. Kunde, Nicolas Hengartner, Benjamin Migliori, Emily Casleton, Nathan A. Debardeleben, Ayan Biswas, Diane Oyen, Earl Lawrence

TL;DR

This work tackles the data- and compute-efficiency bottlenecks of PDE foundation models during long autoregressive rollouts. It introduces test-time computation (TTC), a beam-search style inference that generates $B$ candidate next steps per timestep and selects the best via reward signals, without training-time RL. Two reward-model strategies are proposed: Analytical Reward Models grounded in conservation laws and learned Process Reward Models trained with a contrastive triplet loss; a base ViT-based PDE operator with about $5\times10^6$ parameters is evaluated on PDEGym’s compressible Euler equations, achieving state-of-the-art downstream accuracy with only $6.25\%$ of the training data. The results show substantial data- and compute-efficiency gains and pave the way for RL-inspired, adaptive reasoning in scientific computing, while highlighting considerations for reward design and future integration of reinforcement learning signals in PDE modeling.

Abstract

Partial Differential Equations (PDEs) are the bedrock for modern computational sciences and engineering, and inherently computationally expensive. While PDE foundation models have shown much promise for simulating such complex spatio-temporal phenomena, existing models remain constrained by the pretraining datasets and struggle with auto-regressive rollout performance, especially in out-of-distribution (OOD) cases. Furthermore, they have significant compute and training data requirements which hamper their use in many critical applications. Inspired by recent advances in ``thinking" strategies used in large language models (LLMs), we introduce the first test-time computing (TTC) strategy for PDEs that utilizes computational resources during inference to achieve more accurate predictions with fewer training samples and smaller models. We accomplish this with two types of reward models that evaluate predictions of a stochastic based model for spatio-temporal consistency. We demonstrate this method on compressible Euler-equation simulations from the PDEGym benchmark and show that TTC captures improved predictions relative to standard non-adaptive auto-regressive inference. This TTC framework marks a foundational step towards more advanced reasoning algorithms or PDE modeling, inluding building reinforcement-learning-based approaches, potentially transforming computational workflows in physics and engineering.

Towards Reasoning for PDE Foundation Models: A Reward-Model-Driven Inference-Time-Scaling Algorithm

TL;DR

This work tackles the data- and compute-efficiency bottlenecks of PDE foundation models during long autoregressive rollouts. It introduces test-time computation (TTC), a beam-search style inference that generates candidate next steps per timestep and selects the best via reward signals, without training-time RL. Two reward-model strategies are proposed: Analytical Reward Models grounded in conservation laws and learned Process Reward Models trained with a contrastive triplet loss; a base ViT-based PDE operator with about parameters is evaluated on PDEGym’s compressible Euler equations, achieving state-of-the-art downstream accuracy with only of the training data. The results show substantial data- and compute-efficiency gains and pave the way for RL-inspired, adaptive reasoning in scientific computing, while highlighting considerations for reward design and future integration of reinforcement learning signals in PDE modeling.

Abstract

Partial Differential Equations (PDEs) are the bedrock for modern computational sciences and engineering, and inherently computationally expensive. While PDE foundation models have shown much promise for simulating such complex spatio-temporal phenomena, existing models remain constrained by the pretraining datasets and struggle with auto-regressive rollout performance, especially in out-of-distribution (OOD) cases. Furthermore, they have significant compute and training data requirements which hamper their use in many critical applications. Inspired by recent advances in ``thinking" strategies used in large language models (LLMs), we introduce the first test-time computing (TTC) strategy for PDEs that utilizes computational resources during inference to achieve more accurate predictions with fewer training samples and smaller models. We accomplish this with two types of reward models that evaluate predictions of a stochastic based model for spatio-temporal consistency. We demonstrate this method on compressible Euler-equation simulations from the PDEGym benchmark and show that TTC captures improved predictions relative to standard non-adaptive auto-regressive inference. This TTC framework marks a foundational step towards more advanced reasoning algorithms or PDE modeling, inluding building reinforcement-learning-based approaches, potentially transforming computational workflows in physics and engineering.

Paper Structure

This paper contains 8 sections, 11 equations, 28 figures, 2 tables, 1 algorithm.

Figures (28)

  • Figure 1: a) Snapshots of Compressible Euler PDE dataset used for pretraining (RP, CRP, Gauss and KH) and downstream tasks (RM and RPUI). b) Greedy selection strategy to select the prediction with the best reward from a set of candidate predictions generated by the base foundational model. c) Outputs of the pretrained model are used for training the process reward model (PRM) using contrastive learning. d) Test Time Compute performance on CRP pretraining task. Reward Model Driven TTC provides substantial gains in MSE.
  • Figure 2: Rollout performance of Greedy Selection Strategy on the CRP dataset, for \ref{['fig:crp_mass_maintext']} Analytical Reward Model and \ref{['fig:crp_prm_maintext']} Process-Reward Model. As the branching factor $B$ is increased, the error between predictions and ground truth reduce for all rollout times. PRM provides signficant improvements on error reduction, compared to ARM.
  • Figure 3: Sample Gain Ratio for ViT-7 model on downstream RPUI task, when finetuned over different number of trajectories using \ref{['fig:rpui_mass_ratio_maintext']} ARM and \ref{['fig:rpui_prm_ratio_maintext']} PRMs produce monotonic improvement of MSE on a per-sample basis as the models are trained on increasing number of trajectories.
  • Figure 4: MSE Improvement across various branching factors for ViT-7 models finetuned on different number of RPUI trajectories. As $B$ is increased, MSE of model finetuned on $n_1$ trajectories approaches the MSE of model finetuned on $n_2$ trajectories, where $n_1<n_2$.
  • Figure S1: Rollout Performance of models on CRP
  • ...and 23 more figures