Table of Contents
Fetching ...

Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates

Carlos Stein Brito

TL;DR

Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning.

Abstract

Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existing approaches obtain uncertainty through ensemble replication, fixed stochastic noise such as dropout, or post hoc calibration. Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split. The predictor is optimized on the training split for fit, while low-dimensional uncertainty controls are optimized on the regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning. The framework can learn continuous noise levels at the output head, within hidden features, or within operator-specific components such as spectral modes. We instantiate the approach in Fourier Neural Operators and evaluate on APEBench sweeps over observed fraction and training-set size. Across these sweeps, the learned predictive distributions are better calibrated on held-out splits and the resulting uncertainty fields concentrate in high-error regions in one-step spatial diagnostics.

Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates

TL;DR

Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning.

Abstract

Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existing approaches obtain uncertainty through ensemble replication, fixed stochastic noise such as dropout, or post hoc calibration. Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split. The predictor is optimized on the training split for fit, while low-dimensional uncertainty controls are optimized on the regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning. The framework can learn continuous noise levels at the output head, within hidden features, or within operator-specific components such as spectral modes. We instantiate the approach in Fourier Neural Operators and evaluate on APEBench sweeps over observed fraction and training-set size. Across these sweeps, the learned predictive distributions are better calibrated on held-out splits and the resulting uncertainty fields concentrate in high-error regions in one-step spatial diagnostics.
Paper Structure (35 sections, 12 equations, 11 figures, 2 tables)

This paper contains 35 sections, 12 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Cross-regularized dual-noise training. Train updates optimize $(\theta,\psi)$ on $\mathcal{D}_{\text{train}}$, and periodic regularization updates optimize $\rho$ on $\mathcal{D}_{\text{reg}}$.
  • Figure 2: Regression-mixture calibration in three representative regimes. Left: XReg reference regime with moment-matched regularization. Middle: low-observation XReg regime ($30\%$ observed). Right: model without learned regularization. In all panels, XReg tracks the diagonal more closely on held-out splits, with the largest gains under limited observations.
  • Figure 3: Optimization and uncertainty-allocation dynamics. Left: model without learned regularization versus XReg train/reg losses (solid=train, dashed=reg). XReg maintains a substantially tighter train-to-reg. set gap. Right: pre-head and layerwise generalization-noise scales. Uncertainty is allocated non-uniformly across layers, indicating learned structure.
  • Figure 4: One-step teacher-forced maps (time indices $0$ to $100$): true field, absolute one-step error, generalization uncertainty ($\mathrm{Std}_{\omega}[\mu]$), and total predictive uncertainty. Error and uncertainty panels share the same numeric range (set by total-uncertainty limits) to enable direct spatial comparison. High-error regions co-localize with high predicted uncertainty, indicating spatially selective uncertainty assignment.
  • Figure 5: OTNO car-pressure experiment with cross-regularized uncertainty ($n_{\text{train}}=30,\;n_{\text{reg}}=50$). Left: absolute prediction error on the car surface. Middle: learned uncertainty field on the same sample. Right: roofline slice with ground truth, predictive mean, and uncertainty bands. Error hot spots and uncertainty hot spots are spatially aligned, and the slice shows calibrated widening of uncertainty in regions with larger residuals.
  • ...and 6 more figures