Table of Contents
Fetching ...

Unveiling Scaling Laws of Parameter Identifiability and Uncertainty Quantification in Data-Driven Biological Modeling

Shun Wang, Wenrui Hao

TL;DR

A computational framework is presented that uncovers fundamental scaling laws governing practical identifiability through asymptotic analysis and provides the scaling laws for data-driven modeling in terms of both parameter identifiability and uncertainty, ensuring that data-driven inferences are grounded in verifiable biological reality.

Abstract

Integrating high-dimensional biological data into data-driven mechanistic modeling requires rigorous practical identifiability to ensure interpretability and generalizability. However, coordinate identifiability analysis often suffers from numerical instabilities near singular local minimizers. We present a computational framework that uncovers fundamental scaling laws governing practical identifiability through asymptotic analysis. By synthesizing Fisher information with perturbed Hessian matrices, we establish a hierarchical approach to quantify coordinate identifiability and inform uncertainty quantification within non-identifiable subspaces across different orders. Supported by rigorous mathematical analysis and validated on synthetic and real-world data, our framework was applied to HIV-host dynamics and spatiotemporal amyloid-beta propagation. These applications demonstrate the framework's efficiency in elucidating critical mechanisms underlying HIV diagnostics and Alzheimer's disease progression. In the era of large-scale mechanistic digital twins, our framework provides the scaling laws for data-driven modeling in terms of both parameter identifiability and uncertainty, ensuring that data-driven inferences are grounded in verifiable biological reality.

Unveiling Scaling Laws of Parameter Identifiability and Uncertainty Quantification in Data-Driven Biological Modeling

TL;DR

A computational framework is presented that uncovers fundamental scaling laws governing practical identifiability through asymptotic analysis and provides the scaling laws for data-driven modeling in terms of both parameter identifiability and uncertainty, ensuring that data-driven inferences are grounded in verifiable biological reality.

Abstract

Integrating high-dimensional biological data into data-driven mechanistic modeling requires rigorous practical identifiability to ensure interpretability and generalizability. However, coordinate identifiability analysis often suffers from numerical instabilities near singular local minimizers. We present a computational framework that uncovers fundamental scaling laws governing practical identifiability through asymptotic analysis. By synthesizing Fisher information with perturbed Hessian matrices, we establish a hierarchical approach to quantify coordinate identifiability and inform uncertainty quantification within non-identifiable subspaces across different orders. Supported by rigorous mathematical analysis and validated on synthetic and real-world data, our framework was applied to HIV-host dynamics and spatiotemporal amyloid-beta propagation. These applications demonstrate the framework's efficiency in elucidating critical mechanisms underlying HIV diagnostics and Alzheimer's disease progression. In the era of large-scale mechanistic digital twins, our framework provides the scaling laws for data-driven modeling in terms of both parameter identifiability and uncertainty, ensuring that data-driven inferences are grounded in verifiable biological reality.
Paper Structure (34 sections, 7 theorems, 126 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 34 sections, 7 theorems, 126 equations, 5 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

The matrix $F+\varepsilon H$ can be approximately expressed to third-order of $\varepsilon$ as The practically identifiable and non-identifiable parameters at multiple $\varepsilon$-order are displayed as: (1) Zero-order $\varepsilon~(O(1))$: the non-identifiable parameter is $U_{k-r_0}^{(0)\top}\boldsymbol{\theta}$ and the practically identifiable parameter is $U_{r_0}^{(0)\top}\boldsymbol{\

Figures (5)

  • Figure 1: Illustration of the scaling law of parameter identifiablity. Schematic framework of higher-order parameter identifiability analysis integrates eigenvalue decomposition (EVD) and the Schur complement (SC) to categorize parameter identifiability across hierarchical scales. The eigenvalue matrix $\Sigma$ is color-coded to denote the order of identifiability: red indicates zero-order identifiability [$\mathcal{O}(1)$], blue represents first-order identifiability [$\mathcal{O}(\varepsilon)$], and gray corresponds to second-order identifiability [$\mathcal{O}(\varepsilon^2)$]. Within the eigenvector matrix $U$, these spectral regimes define specific parameter combinations: zero-order identifiable coordinates ($\boldsymbol{U_{r_0}^\top \theta}$, red), first-order identifiable coordinates ($\boldsymbol{U_{r_1}^\top \theta}$, blue), and second-order identifiable coordinates ($\boldsymbol{U_{k-r_0-r_1}^\top \theta}$, gray). The metric $\mathcal{K}_i$ is employed to determine the higher-order coordinate practical identifiability. Furthermore, the higher-order uncertainty quantification (UQ) framework evaluates the predictive uncertainty originating from non-identifiable subspaces, specifically isolating contributions from zero-order non-identifiable parameters ($\boldsymbol{U_{k-r_0}^\top \theta}$, red region) and first-order non-identifiable parameters ($\boldsymbol{U_{k-r_0-r_1}^\top \theta}$, blue region).
  • Figure 2: Validation method accuracy in polynomial fitting.(A) Coordinate identifiability analysis at $\boldsymbol{\theta^*}=[2,0,0,0]^T$ using the profile likelihood. (B) The metrics $\mathcal{K}_i$ for conducting practical identifiability analysis. (C) Eigenvalue of $\varepsilon$-order practical identifiability analysis and heatmap of the eigenvector matrix. The dashed line is the threshold $\epsilon=10^{-3}$. The color bar represents the values of each eigenvector element. The shaded area indicates the eigenvectors corresponding to $\varepsilon$-order non-identifiable parameters. (D) UQ from the perturbation to $\varepsilon$-order non-identifiable parameters. Circles represent the synthetic data generated from the polynomial function. The solid line represents the polynomial function with the given parameter values $\boldsymbol{\theta^*}$. The red and green shaded regions represent the 95% confidence intervals under zero-order and first-order $\varepsilon$ perturbations of the non-identifiable parameters, respectively.
  • Figure 3: Higher-order parameter identifiability analysis of HIV virus-host dynamics.(A) Schematic of the HIV infection model. (B) Eigenvalues from the $\varepsilon$-order practical identifiability analysis and heatmap of the corresponding eigenvector matrix. The dashed line indicates the threshold $\epsilon = 10^{-3}$. The color bar represents the magnitude of each eigenvector element, and the shaded area highlights eigenvectors corresponding to $\varepsilon$-order non-identifiable parameters. (C) Uncertainty quantification (UQ) from perturbations of $\varepsilon$-order non-identifiable parameters. Circles represent clinical log-transformed plasma HIV concentrations stafford2000modeling, and the solid line shows reconstructed dynamics using $\boldsymbol{\theta^*}$ from Table \ref{['tab:S1']}. The red and green shaded regions denote 95% confidence intervals for zero-order and first-order $\varepsilon$ perturbations, respectively. (D) Metrics $\mathcal{K}_i$ used for higher-order practical identifiability analysis. (E) Coordinate-wise identifiability analysis for parameters $\pi$ and $c$ using the profile likelihood method.
  • Figure 4: Hierarchical parameter identifiability of Amyloid-$\beta$ ($A\beta$) spatiotemporal dynamics.(A) Image preprocessing pipeline for constructing the graph Laplacian matrix. (B) Reconstruction of $A\beta$ spatiotemporal dynamics using the network-based PDE model. The colorbar represents the standardized uptake value ratios (SUVRs) of $A\beta$. (C) Uncertainty quantification (UQ) from perturbations of $\varepsilon$-order non-identifiable parameters. Circles denote observed $A\beta$ SUVRs, and the solid line shows reconstructed dynamics using $\boldsymbol{\theta^*}$ from Tables \ref{['tab:S3']}--\ref{['tab:S9']}. The red and green shaded regions represent 95% confidence intervals for zero-order and first-order $\varepsilon$ perturbations, respectively. (D) Heatmap of the eigenvector matrix. The dashed line indicates the threshold $\epsilon = 2\times10^{-3}$. The color bar represents the magnitude of each eigenvector element, and the shaded area highlights eigenvectors corresponding to $\varepsilon$-order non-identifiable parameters. (E) Metrics $\mathcal{K}_i$ used for higher-order practical identifiability analysis of the first 17 brain regions.
  • Figure S1: Validation method accuracy in polynomial fitting.(A) Eigenvalue of $\varepsilon$-order practical identifiability analysis and heatmap of the eigenvector matrix. The dashed line is the threshold $\epsilon=10^{-3}$. (B) The metrics $\mathcal{K}_i$ for conducting practical identifiability analysis for the remaining 51 brain regions.

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Corollary 1
  • Definition 3
  • Theorem 2
  • Corollary 2
  • Corollary 3
  • Corollary 4
  • Theorem 3
  • ...and 6 more