Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

Jonas Beck; Nathanael Bosch; Michael Deistler; Kyra L. Kadhim; Jakob H. Macke; Philipp Hennig; Philipp Berens

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

Jonas Beck, Nathanael Bosch, Michael Deistler, Kyra L. Kadhim, Jakob H. Macke, Philipp Hennig, Philipp Berens

TL;DR

The paper tackles the problem of gradient-based parameter estimation in nonlinear ODEs, where local minima and sensitivity to initialization hinder reliable optimization. It introduces diffusion tempering, a schedule-based regularization for probabilistic ODE solvers that starts with a high diffusion $\kappa$ to smooth the loss surface and progressively lowers $\kappa$ to emphasize the true IVP solution, thereby guiding optimization toward the global optimum. Empirical results on a simple pendulum and the Hodgkin–Huxley model show that diffusion tempering substantially improves convergence and parameter recovery over traditional RK-based methods and prior Fenrir approaches, including high-parameter HH settings. The approach offers a principled pathway to gradient-based, data-efficient parameter inference in complex dynamical systems, with practical impact for scientific modeling and systems biology.

Abstract

Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

TL;DR

to smooth the loss surface and progressively lowers

to emphasize the true IVP solution, thereby guiding optimization toward the global optimum. Empirical results on a simple pendulum and the Hodgkin–Huxley model show that diffusion tempering substantially improves convergence and parameter recovery over traditional RK-based methods and prior Fenrir approaches, including high-parameter HH settings. The approach offers a principled pathway to gradient-based, data-efficient parameter inference in complex dynamical systems, with practical impact for scientific modeling and systems biology.

Abstract

Paper Structure (32 sections, 26 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 26 equations, 11 figures, 7 tables, 1 algorithm.

Introduction
Parameter inference in ODEs
Classic Numerical Integration
Probabilistic Numerical Integration
Probabilistic Numerical IVP Solvers
PN-approximated Marginal Likelihood
Shortcomings
Method
Diffusion as Regularization
Diffusion tempering
Experiments
Exploration of a simple case: The 1D Pendulum
Exploration of a complex case: The Hodgkin--Huxley Model
Systematic performance benchmarking against alternative methods
Discussion
...and 17 more sections

Figures (11)

Figure 1: For a pendulum model, Fenrir produces parameter estimates with much lower trajectory mean squared errors (\ref{['def:trmse']}) than RK least-squares regression. However, for the more complex HH model, the tRMSE is very similar for both RK and Fenrir. Our proposed method is able to produce much better estimates for both problems.
Figure 2: PN posterior means for a range of $\kappa$ for a HH model. The parameters that generated the observation (inset, black) are different from the parameters of the ODE (inset, grey). For low $\kappa$ the mean closely adheres to the ODE solution, while for high values, it fits the observation much better. The maximum likelihood $\kappa$ is highlighted in red.
Figure 3: The effect of diffusion tempering on the marginal likelihood $\mathcal{M}(\theta, \kappa)$ of a pendulum with parameter $l$. A The negative log likelihood (nll) for a high $\kappa$ is very smooth with only one shallow global optimum (yellow). The likelihood for low $\kappa$ has one sharp global minimum at the true parameters (dashed), but also other local minima (blue, green). B IVP solutions for the different local optima marked in A and the true solution (dashed). C Diffusion tempering for an exemplary optimization run (black): Optimization starts with a high $\kappa$ (top) on a very smooth loss landscape. $\kappa$ is then progressively lowered, revealing a shallow global optimum near the true parameters $(\kappa=10^{16})$. Further lowering $\kappa$ finally reveals a sharp global optimum and the optimization converges correctly. By starting with the parameters of the previous optimization, diffusion tempering ensures that the optimization closes in on the correct global optimum.
Figure 4: Parameter estimation for the sodium $g_{Na}$ and potassium $g_{K}$ conductance of a HH model. A Circuit modeling components of a neuronal cell as electrical elements. The lipid bilayer acts as a capacitor ($C_m$). Ion channels are represented by resistors. Sodium and potassium conductances are voltage dependent ($g_{Na}$,$g_{K}$), and the leak conductance is constant ($g_{leak}$). The electrochemical gradients driving the flow of ions can be represented as voltage sources ($E_{Na}$,$E_{K}$,$E_{leak}$). B Noisy observation of the membrane voltage. C Solutions of the ODE for different combinations of parameters (black) compared to the true parameters (grey). The true parameters are highlighted in red. D Parameter optimization during four different stages of diffusion tempering for a random subset of initializations. Optimization trajectories (dotted) in each likelihood landscape are shown from start (blue) to convergence (orange). Very high loss values were clipped for better visual clarity. Plots for the full schedule are provided in \ref{['fig:appendix_figure4a']}, with corresponding solutions in \ref{['fig:appendix_figure4b']}.
Figure 5: Convergence for a two parameter HH model: Diffusion tempering converges more reliably than than non-linear least-squares regression using a RK solver and Fenrir with a learned and the best single $\kappa$. Histogram shows the medians with black bars indicating quartiles for 100 runs split into groups of 10.
...and 6 more figures

Theorems & Definitions (3)

Definition A3.1: Trajectory RMSE
Definition A3.2: Relative Parameter RMSE
Definition A3.3: Summary feature

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

TL;DR

Abstract

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (3)