Table of Contents
Fetching ...

Text-Trained LLMs Can Zero-Shot Extrapolate PDE Dynamics, Revealing a Three-Stage In-Context Learning Mechanism

Jiajun Bao, Nicolas Boullé, Toni J. B. Liu, Raphaël Sarfati, Christopher J. Earls

TL;DR

The paper shows that text-trained, zero-shot LLMs can extrapolate discretized PDE dynamics from serialized spatiotemporal data without fine-tuning or NL prompts. It reveals in-context scaling laws: prediction improves with longer temporal context but worsens with finer spatial discretization, and errors grow algebraically over multi-step rollouts. A three-stage entropy-based progression—syntax imitation, exploratory uncertainty, and consolidation—emerges as these models internalize PDE structure purely from in-context exposure. The work indicates that pretrained LLMs encode numerical priors and invariants that enable coherent spatiotemporal predictions, offering a lens into emergent reasoning biases and potential tools for probing numerical dynamics in large language models.

Abstract

Large language models (LLMs) have demonstrated emergent in-context learning (ICL) capabilities across a range of tasks, including zero-shot time-series forecasting. We show that text-trained foundation models can accurately extrapolate spatiotemporal dynamics from discretized partial differential equation (PDE) solutions without fine-tuning or natural language prompting. Predictive accuracy improves with longer temporal contexts but degrades at finer spatial discretizations. In multi-step rollouts, where the model recursively predicts future spatial states over multiple time steps, errors grow algebraically with the time horizon, reminiscent of global error accumulation in classical finite-difference solvers. We interpret these trends as in-context neural scaling laws, where prediction quality varies predictably with both context length and output length. To better understand how LLMs are able to internally process PDE solutions so as to accurately roll them out, we analyze token-level output distributions and uncover a consistent three-stage ICL progression: beginning with syntactic pattern imitation, transitioning through an exploratory high-entropy phase, and culminating in confident, numerically grounded predictions.

Text-Trained LLMs Can Zero-Shot Extrapolate PDE Dynamics, Revealing a Three-Stage In-Context Learning Mechanism

TL;DR

The paper shows that text-trained, zero-shot LLMs can extrapolate discretized PDE dynamics from serialized spatiotemporal data without fine-tuning or NL prompts. It reveals in-context scaling laws: prediction improves with longer temporal context but worsens with finer spatial discretization, and errors grow algebraically over multi-step rollouts. A three-stage entropy-based progression—syntax imitation, exploratory uncertainty, and consolidation—emerges as these models internalize PDE structure purely from in-context exposure. The work indicates that pretrained LLMs encode numerical priors and invariants that enable coherent spatiotemporal predictions, offering a lens into emergent reasoning biases and potential tools for probing numerical dynamics in large language models.

Abstract

Large language models (LLMs) have demonstrated emergent in-context learning (ICL) capabilities across a range of tasks, including zero-shot time-series forecasting. We show that text-trained foundation models can accurately extrapolate spatiotemporal dynamics from discretized partial differential equation (PDE) solutions without fine-tuning or natural language prompting. Predictive accuracy improves with longer temporal contexts but degrades at finer spatial discretizations. In multi-step rollouts, where the model recursively predicts future spatial states over multiple time steps, errors grow algebraically with the time horizon, reminiscent of global error accumulation in classical finite-difference solvers. We interpret these trends as in-context neural scaling laws, where prediction quality varies predictably with both context length and output length. To better understand how LLMs are able to internally process PDE solutions so as to accurately roll them out, we analyze token-level output distributions and uncover a consistent three-stage ICL progression: beginning with syntactic pattern imitation, transitioning through an exploratory high-entropy phase, and culminating in confident, numerically grounded predictions.

Paper Structure

This paper contains 46 sections, 27 equations, 25 figures, 1 table, 1 algorithm.

Figures (25)

  • Figure 1: Zero-shot PDE extrapolation workflow with LLMs. A reference PDE solution to the Allen--Cahn equation is discretized over space and time, quantized to 3-digit integers, and serialized into a token sequence with spatial and temporal delimiters. Each value and delimiter is mapped to a token. The LLM autoregressively generates future tokens from past context without fine-tuning or natural language prompting. The generated tokens are parsed and reconstructed into floating-point solutions. LLM-predicted rollouts and absolute errors are compared against a numerical solver.
  • Figure 2: In-context error scaling with temporal discretization (left) and spatial discretization (right). The top axes show $N_\mathrm{T}$ and $N_\mathrm{X}$, while the bottom axes show the equivalent LLM context and output lengths $N_{\mathrm{Tokens}}$, respectively. RMSE decreases with longer context, converging in the extended-context regime, toward the local truncation behavior of first-order-in-time solvers (FTCS, IMEX). In contrast, errors grow with output length, following a capacity-dependent generalization trend. Shaded regions show 95% confidence intervals over 50 random initial conditions. The gray dotted line indicates the unavoidable quantization error floor defined in the \ref{['sec:Methodology']} section.
  • Figure 3: Multi-step prediction for randomly sampled initial conditions of two PDEs. The first row shows the Allen‚ÄìCahn equation, and the second shows the wave equation ($c=0.3$; see Supplementary Material Sect. \ref{['subsec:appendix-other-pdes']}). In each case, to the left of the dashed line corresponds to the input context provided to the LLM, and to the right corresponds to a 10-step autoregressive continuation from a single generation of each model. Classical finite difference solvers (FTCS for Allen--Cahn, leapfrog for wave) solve the corresponding initial value problem using the final in-context time slice as the initial condition, and advance the solution for 10 steps using the same spatial and temporal discretization as the LLMs. The final three columns report pointwise absolute errors relative to the reference solution.
  • Figure 4: Multi-step rollout error trends. RMSE grows algebraically with prediction steps $n$ (top axis) and equivalent LLM output length $N_{\mathrm{Tokens}}$ (bottom axis). Left: rollout from a single random initial condition (as in Fig. \ref{['fig:multi-step-demo']}). Right: average over 20 random initial conditions. Error growth rates are estimated via log--log fits and reported on the right. Shaded regions denote 95% confidence intervals (left: across 20 repeated LLM runs; right: across 20 initial conditions).
  • Figure 5: Three-stage ICL progression and the evolution of predictive uncertainty. (a--b) Mean spatial entropy $\bar{H}$ vs. (a) temporal context length $N_\mathrm{T}$ at fixed $N_\mathrm{X} = 14$ and (b) output length $N_\mathrm{X}$ at fixed $N_\mathrm{T} = 50$. Shaded regions: 95% confidence intervals over 50 random initial conditions. (c) Token-level softmax distributions at three ICL stages: syntax-only ($N_\mathrm{T} = 2$), exploratory ($N_\mathrm{T} = 5$), and consolidation ($N_\mathrm{T} = 20$), extracted from Llama-3.1-8B for the same initial condition as the multi-step rollout example. Top 8 tokens (by probability) are shown per spatial position; only odd positions are displayed, with full results in Supplementary Material Sect. \ref{['subsec:appendix-additional-distributions']}. (d) Softmax over separator tokens. Early, high-confidence delimiter predictions are the signature of the syntax-only stage: the model acquires and stabilizes delimiter syntax with minimal context, and this high-confidence behavior over separators is preserved as the model subsequently learns the PDE dynamics.
  • ...and 20 more figures