Table of Contents
Fetching ...

Penalized Likelihood Parameter Estimation for Differential Equation Models: A Computational Tutorial

Matthew J Simpson, James S Bennett, Alexander Johnston, Ruth E Baker

TL;DR

The paper addresses the challenge of estimating parameters for ODE models from noisy data by adopting generalized profiling, a penalized-likelihood framework that uses trial splines to approximate the ODE solution while enforcing the governing equations. By maximizing $\ell(\boldsymbol{\theta}; T^{\textrm{o}}(t))= w_{\mathrm{d}}\ell_{\mathrm{d}} + w_{\mathrm{m}}\ell_{\mathrm{m}}$ and iteratively updating spline coefficients and parameters (often via Nelder–Mead), the method avoids repeatedly solving the ODE and scales better to larger systems. The paper demonstrates the approach on scalar ODEs (Newton's law of cooling and logistic growth) and a small reaction network, followed by a real-data coral reef recovery application, all using open-source Julia notebooks for reproducibility. It highlights connections to PINNs/BINNs and outlines promising extensions to PDEs and identifiability analysis, offering a practical, scalable toolbox for mechanistic parameter estimation. Overall, generalized profiling provides accurate parameter estimates with improved computational efficiency and robustness to initial-condition specification, making it attractive for complex, real-world dynamical systems.

Abstract

Parameter estimation connects mathematical models to real-world data and decision making across many scientific and industrial applications. Standard approaches such as maximum likelihood estimation and Markov chain Monte Carlo estimate parameters by repeatedly solving the model, which often requires numerical solutions of differential equation models. In contrast, generalized profiling (also called parameter cascading) focuses directly on the governing differential equation(s), linking the model and data through a penalized likelihood that explicitly measures both the data fit and model fit. Despite several advantages, generalized profiling is relatively rarely used in practice. This tutorial-style article outlines a set of self-directed computational exercises that facilitate skills development in applying generalized profiling to a range of ordinary differential equation models. All calculations can be repeated using reproducible open-source Jupyter notebooks that are available on GitHub.

Penalized Likelihood Parameter Estimation for Differential Equation Models: A Computational Tutorial

TL;DR

The paper addresses the challenge of estimating parameters for ODE models from noisy data by adopting generalized profiling, a penalized-likelihood framework that uses trial splines to approximate the ODE solution while enforcing the governing equations. By maximizing and iteratively updating spline coefficients and parameters (often via Nelder–Mead), the method avoids repeatedly solving the ODE and scales better to larger systems. The paper demonstrates the approach on scalar ODEs (Newton's law of cooling and logistic growth) and a small reaction network, followed by a real-data coral reef recovery application, all using open-source Julia notebooks for reproducibility. It highlights connections to PINNs/BINNs and outlines promising extensions to PDEs and identifiability analysis, offering a practical, scalable toolbox for mechanistic parameter estimation. Overall, generalized profiling provides accurate parameter estimates with improved computational efficiency and robustness to initial-condition specification, making it attractive for complex, real-world dynamical systems.

Abstract

Parameter estimation connects mathematical models to real-world data and decision making across many scientific and industrial applications. Standard approaches such as maximum likelihood estimation and Markov chain Monte Carlo estimate parameters by repeatedly solving the model, which often requires numerical solutions of differential equation models. In contrast, generalized profiling (also called parameter cascading) focuses directly on the governing differential equation(s), linking the model and data through a penalized likelihood that explicitly measures both the data fit and model fit. Despite several advantages, generalized profiling is relatively rarely used in practice. This tutorial-style article outlines a set of self-directed computational exercises that facilitate skills development in applying generalized profiling to a range of ordinary differential equation models. All calculations can be repeated using reproducible open-source Jupyter notebooks that are available on GitHub.
Paper Structure (8 sections, 22 equations, 6 figures)

This paper contains 8 sections, 22 equations, 6 figures.

Figures (6)

  • Figure 1: (a) Dashed curve shows $g(t) = 20 + 160 \, \textrm{exp}(-t/20)$ on $t \in [0, 100]$. Blue dots show $J=11$ equally-spaced noisy samples at $t=0,10,20,\ldots,100$. The $j$th data point is given by $T^{\textrm{o}}(t_j) = g(t_j) + \mathcal{N}(0,\sigma^2)$ for $j=1,2,3,\ldots,J$, with $\sigma=8$ for this data realization. The red curve is a cubic B-spline interpolation. (b) Unscaled cubic basis functions, $B_j(t)$ for $j=1,2,3,\ldots,J$. (c) Scaled cubic basis functions, $c_j B_j(t)$ for $j=1,2,3,\ldots,J$. The colors used to plot $B_j(t)$ in (b) are the same as the colors used to plot the corresponding $c_j B_j(t)$ in (c). Summing the scaled cubic basis functions in (c) gives the B-spline interpolation in (a), $f(t) = \sum_{j=1}^{J} c_j B_j(t)$ (red curve).
  • Figure 2: (a) Plots of $\textrm{d} B_j(t) / \textrm{d} t$ for $j=1,2,3,\ldots, J$ based on the unscaled cubic basis functions in Figure \ref{['fig:F1']}(b). The color of each $\textrm{d} B_j(t) / \textrm{d} t$ corresponds to the colors in Figure \ref{['fig:F1']}(b) for each cubic basis function, $B_j(t)$ for $j=1,2,3,\ldots, J$. (b) Scaled quadratic polynomials $c_j \textrm{d} B_j(t) / \textrm{d} t$ for $j=1,2,3,\ldots,J$. Each scaled quadratic polynomial is plotted using the same color as the corresponding unscaled quadratic polynomial in (a). Summing the scaled piecewise quadratic polynomials in (b) gives the red curve in (c), $\textrm{d}f(t)/ \textrm{d}t = \sum_{j=1}^{J} c_j \, \textrm{d}B_j(t) /\textrm{d}t$. This approximation is superimposed on the exact result $\textrm{d}g(t)/\textrm{d}t = -8\exp(-t/20)$. For this data set we have $J=11$.
  • Figure 3: Newton's law of cooling case study with synthetic data. (a) Data (blue dots) and over-fitted spline (red curve) from Figure \ref{['fig:F1']}(a). (b) Plot of $\xi(t;\boldsymbol{\theta}^{(1)})$ (purple curve) with $\boldsymbol{\theta}^{(1)} = \left(\alpha, T_{\textrm{a}},\sigma \right)^\top = \left(0.037, 17.229, \sigma \right)^\top$. Results in (c)--(d) illustrate the outcome after two iterations, with (c) showing the updated spline (red curve) superimposed on the data (blue dots) and (d) showing $\xi(t;\boldsymbol{\theta}^{(2)})$ (purple curve) with $\boldsymbol{\theta}^{(2)} = (\alpha, T_{\textrm{a}},\sigma)^\top= (0.044, 23.1100, 8.874)^\top$. Results in (e)--(f) illustrate the outcome after ten iterations, with (e) showing the updated spline (red curve) superimposed on the data (blue dots) and (f) showing $\xi(t;\boldsymbol{\theta}^{(10)})$ (purple curve) with $\boldsymbol{\theta}^{(10)} = (\alpha, T_{\textrm{a}},\sigma)^\top = (0.042, 20.833, 7.737)^\top$. Calculations are performed using $J=11$ and $K=1001$.
  • Figure 4: Logistic growth case study with synthetic data. (a) Data (blue dots) are collected at $t=0, 10, 20,\ldots, 100$ and interpolated using an over-fitted spline (red curve). (b) Plot of $\xi(t;\boldsymbol{\theta}^{(1)})$ (purple curve) with $\boldsymbol{\theta}^{(1)} = \left(\lambda, \kappa, \sigma\right)^\top = \left(0.104, 101.754, \sigma \right)^\top$. Results in (c)--(d) illustrate the outcome after ten iterations, with (c) showing the updated spline (red curve) superimposed on the data (blue dots) and (d) showing $\xi(t;\boldsymbol{\theta}^{(10)})$ (purple curve) with $\boldsymbol{\theta}^{(10)} = (\lambda, \kappa, \sigma)^\top = (0.108, 99.658, 3.841)^\top$. Calculations are performed using $J=11$ and $K=1001$.
  • Figure 5: CRN case study with synthetic data. (a) Data for $C_1^{\textrm{o}}(t)$ (blue dots) and $C_2^{\textrm{o}}(t)$ (blue squares) are collected at $t=0, 10, 20,\ldots, 100$ and interpolated using over-fitted splines $f_1(t)$ (red curve) and $f_2(t)$ (cyan curve). (b) Plot of $\xi_1(t;\boldsymbol{\theta}^{(1)})$ (red curve) and $\xi_2(t;\boldsymbol{\theta}^{(1)})$ (cyan curve) with $\boldsymbol{\theta}^{(1)} = (r_1, r_2, \sigma )^\top = (0.061,0.041, \sigma)^\top$. Results in (c)--(d) illustrate the outcome after ten iterations, with (c) showing the updated splines $f_1(t)$ (red curve) and $f_2(t)$ (cyan curve) superimposed on the same data. (d) Estimates of $\xi_1(t;\boldsymbol{\theta}^{(10)})$ (red curve) and $\xi_2(t;\boldsymbol{\theta}^{(10)})$ (cyan curve) with $\boldsymbol{\theta}^{(10)} = (r_1, r_2, \sigma )^\top = (0.059,0.040,0.238)^\top$. Calculations are performed using $J=11$ and $K=1001$.
  • ...and 1 more figures