Table of Contents
Fetching ...

Estimation of System Parameters Including Repeated Cross-Sectional Data through Emulator-Informed Deep Generative Model

Hyunwoo Cho, Sung Woong Cho, Hyeontae Jo, Hyung Ju Hwang

TL;DR

This work tackles parameter estimation for dynamical systems using repeated cross-sectional data, where heterogeneity hampers traditional optimization and GP-based calibration. It introduces Emulator-Informed Deep Generative Model (EIDGM), which combines a HyperPINN-based DE solver emulator with a Wasserstein GAN to learn the full posterior distribution of parameters that generate the observed RCS data. Validations on exponential growth, logistic, and Lorenz systems show EIDGM’s ability to recover complex, multimodal parameter distributions, and a real-world application to Amyloid-β biomarkers demonstrates its capability to capture diverse operating patterns from limited data. Overall, EIDGM offers a scalable, information-rich framework for mechanistic inference in dynamical systems from cross-sectional observations, with potential impact across biology, economics, and political science.

Abstract

Differential equations (DEs) are crucial for modeling the evolution of natural or engineered systems. Traditionally, the parameters in DEs are adjusted to fit data from system observations. However, in fields such as politics, economics, and biology, available data are often independently collected at distinct time points from different subjects (i.e., repeated cross-sectional (RCS) data). Conventional optimization techniques struggle to accurately estimate DE parameters when RCS data exhibit various heterogeneities, leading to a significant loss of information. To address this issue, we propose a new estimation method called the emulator-informed deep-generative model (EIDGM), designed to handle RCS data. Specifically, EIDGM integrates a physics-informed neural network-based emulator that immediately generates DE solutions and a Wasserstein generative adversarial network-based parameter generator that can effectively mimic the RCS data. We evaluated EIDGM on exponential growth, logistic population models, and the Lorenz system, demonstrating its superior ability to accurately capture parameter distributions. Additionally, we applied EIDGM to an experimental dataset of Amyloid beta 40 and beta 42, successfully capturing diverse parameter distribution shapes. This shows that EIDGM can be applied to model a wide range of systems and extended to uncover the operating principles of systems based on limited data.

Estimation of System Parameters Including Repeated Cross-Sectional Data through Emulator-Informed Deep Generative Model

TL;DR

This work tackles parameter estimation for dynamical systems using repeated cross-sectional data, where heterogeneity hampers traditional optimization and GP-based calibration. It introduces Emulator-Informed Deep Generative Model (EIDGM), which combines a HyperPINN-based DE solver emulator with a Wasserstein GAN to learn the full posterior distribution of parameters that generate the observed RCS data. Validations on exponential growth, logistic, and Lorenz systems show EIDGM’s ability to recover complex, multimodal parameter distributions, and a real-world application to Amyloid-β biomarkers demonstrates its capability to capture diverse operating patterns from limited data. Overall, EIDGM offers a scalable, information-rich framework for mechanistic inference in dynamical systems from cross-sectional observations, with potential impact across biology, economics, and political science.

Abstract

Differential equations (DEs) are crucial for modeling the evolution of natural or engineered systems. Traditionally, the parameters in DEs are adjusted to fit data from system observations. However, in fields such as politics, economics, and biology, available data are often independently collected at distinct time points from different subjects (i.e., repeated cross-sectional (RCS) data). Conventional optimization techniques struggle to accurately estimate DE parameters when RCS data exhibit various heterogeneities, leading to a significant loss of information. To address this issue, we propose a new estimation method called the emulator-informed deep-generative model (EIDGM), designed to handle RCS data. Specifically, EIDGM integrates a physics-informed neural network-based emulator that immediately generates DE solutions and a Wasserstein generative adversarial network-based parameter generator that can effectively mimic the RCS data. We evaluated EIDGM on exponential growth, logistic population models, and the Lorenz system, demonstrating its superior ability to accurately capture parameter distributions. Additionally, we applied EIDGM to an experimental dataset of Amyloid beta 40 and beta 42, successfully capturing diverse parameter distribution shapes. This shows that EIDGM can be applied to model a wide range of systems and extended to uncover the operating principles of systems based on limited data.
Paper Structure (24 sections, 8 theorems, 46 equations, 5 figures, 7 tables)

This paper contains 24 sections, 8 theorems, 46 equations, 5 figures, 7 tables.

Key Result

Theorem A.1

Suppose that $f[\mathbf{y}(t), \mathbf{p}, t]$ is Lipschitz continuous in $\mathbf{y}$ with Lipschitz constant $L > 0$. Assume that the the neural network $nn(t)$ satisfies $\left|\mathbf{y}\left(t_{1}; \mathbf{p}\right)-nn\left(t_{1}; \theta_{nn}\right)\right| \leq \delta$ for some $\delta \geq 0$,

Figures (5)

  • Figure 1: Schematic of Emulator-Informed Deep-Generative Model (EIDGM) A generator in WGAN is trained to produce a possible set of parameters. These parameters are used as inputs of HyperPINN, immediately producing solutions corresponding to the set of parameters. We then calculate the loss through the discriminator in WGAN by measuring the difference (distribution) between solutions and RCS data. By minimizing the loss, the generator produces the precise parameter distributions of the DE.
  • Figure 2: Visualization of parameter estimates for exponential growth model with three different emulators: Gaussian Process (GP), DeepONet (with WGAN), and EIDGM (HyperPINN with WGAN).(a) If data are obtained from an exponential model with one parameter peak (RCS Data), all methods accurately estimate the underlying parameters (Right, Estimation). (b), (c) When data are obtained with multiple parameter peaks, GP fails to estimate the underlying parameter distribution, while the Emulator+WGAN models accurately estimate the parameters.
  • Figure 3: Visualization of estimation results for logistic population model with three different emulators: GP, DeepONet, and HyperPINN+WGAN (EIDGM).(a) If data are obtained from a logistic model with one parameter peak (Left, Observation), all methods accurately estimate the underlying parameters (Right, Estimation). (b) (c) Only EIDGM accurately estimates the parameters when there are two or three parameter peaks.
  • Figure 4: Estimation of parameter distribution for the Lorenz system using EIDGM. This model describes the temporal concentration profiles of three components: $X$, $Y$, and $Z$ (left three images in each row). Three types of parameter distributions are considered: unimodal (a), bimodal (b), and trimodal (c). For each type, we employed RCS data for three different populations (left three images) and presented the estimation results for each parameter, $\sigma$, $\rho$, and $\beta$ (rightmost image) among the six parameters included in the model.
  • Figure 5: Estimation results for amyloid beta accumulation in real experimental datasets using a logistic model with EIDGM. (a-b) We utilized EIDGM with two distinct datasets (left): A$\beta$40 and A$\beta$42 (left red dots). The parameter estimates revealed different patterns (right black histograms), and the corresponding trajectories accurately matched the experimental data (left translucent black lines) (a). Similar patterns were observed with the other dataset (b).

Theorems & Definitions (13)

  • Theorem A.1
  • Definition A.1
  • Theorem A.2
  • Corollary 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem A.3
  • ...and 3 more