Probabilistic Retrofitting of Learned Simulators

Cristiana Diaconu; Miles Cranmer; Richard E. Turner; Tanya Marwah; Payel Mukhopadhyay

Probabilistic Retrofitting of Learned Simulators

Cristiana Diaconu, Miles Cranmer, Richard E. Turner, Tanya Marwah, Payel Mukhopadhyay

TL;DR

The results show that probabilistic PDE modelling need not require retraining from scratch, but can be unlocked from existing deterministic backbones with modest additional training cost.

Abstract

Dominant approaches for modelling Partial Differential Equations (PDEs) rely on deterministic predictions, yet many physical systems of interest are inherently chaotic and uncertain. While training probabilistic models from scratch is possible, it is computationally expensive and fails to leverage the significant resources already invested in high-performing deterministic backbones. In this work, we adopt a training-efficient strategy to transform pre-trained deterministic models into probabilistic ones via retrofitting with a proper scoring rule: the Continuous Ranked Probability Score (CRPS). Crucially, this approach is architecture-agnostic: it applies the same adaptation mechanism across distinct model backbones with minimal code modifications. The method proves highly effective across different scales of pre-training: for models trained on single dynamical systems, we achieve 20-54% reductions in rollout CRPS and up to 30% improvements in variance-normalised RMSE (VRMSE) relative to compute-matched deterministic fine-tuning. We further validate our approach on a PDE foundation model, trained on multiple systems and retrofitted on the dataset of interest, to show that our probabilistic adaptation yields an improvement of up to 40% in CRPS and up to 15% in VRMSE compared to deterministic fine-tuning. Validated across diverse architectures and dynamics, our results show that probabilistic PDE modelling need not require retraining from scratch, but can be unlocked from existing deterministic backbones with modest additional training cost.

Probabilistic Retrofitting of Learned Simulators

TL;DR

The results show that probabilistic PDE modelling need not require retraining from scratch, but can be unlocked from existing deterministic backbones with modest additional training cost.

Abstract

Paper Structure (75 sections, 6 equations, 22 figures, 21 tables, 5 algorithms)

This paper contains 75 sections, 6 equations, 22 figures, 21 tables, 5 algorithms.

Introduction
Background
Deterministic Modelling
Related Work
Probabilistic PDE Modelling
Scoring Rules and CRPS
Methodology: Probabilistic Retrofitting
Noise Injection
Single-system Models
Foundation Model
Datasets
Evaluation Metrics
Results
Single-system models
Performance compared to deterministic fine-tuning
...and 60 more sections

Figures (22)

Figure 1: Comparison between deterministic and probabilistic retrofitting. Left (Deterministic): Existing models ($\mathcal{M}_\theta$) map input history $\mathbf{x}_{\text{history}}$ to a single output using MAE/MSE loss, often yielding smoothed, average predictions. Right (Our Approach): We introduce a stochastic branch where noise $\epsilon$ is projected via an MLP to modulate the backbone $\tilde{\mathcal{M}}_\theta$. The model is retrofitted using the CRPS loss. At inference, sampling multiple $\epsilon$ vectors generates an ensemble of sharp, diverse predictions for a single input history.
Figure 2: Improvement of CRPS retrofitting over deterministic fine-tuning for the single-system models: (Left) CRPS; (Right) VRMSE. The confidence intervals are obtained through $100$ bootstrapping iterations and use a confidence level of $68\%$.
Figure 3: Comparison of Ground Truth (GT), Deterministic-tuned baseline, and three independent samples from the CRPS-tuned ensemble (Walrus architecture) on the RB dataset. While the deterministic model blurs significantly at later time steps due to uncertainty averaging, the CRPS samples maintain fine-scale details throughout the rollout.
Figure 4: Error scaling. Log-log plot of rollout VRMSE (normalised by the single-member baseline) vs. ensemble size ($M$) for the Walrus model.
Figure 5: HalfWalrus Performance Analysis. Left: Improvement of CRPS retrofitting over deterministic fine-tuning across CRPS and VRMSE. Right: Error scaling with ensemble members.
...and 17 more figures

Probabilistic Retrofitting of Learned Simulators

TL;DR

Abstract

Probabilistic Retrofitting of Learned Simulators

Authors

TL;DR

Abstract

Table of Contents

Figures (22)