Table of Contents
Fetching ...

Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model

Hyunwoo Cho, Hyeontae Jo, Hyung Ju Hwang

TL;DR

This work tackles robustly inferring parameters and recovering missing components in nonlinear dynamic systems when data are noisy, sparse, or partially observed. It introduces SiGMoID, a hybrid framework that combines a HyperPINN-based ODE solver with a Wasserstein-GAN to (i) quantify observation noise, (ii) estimate system parameters, and (iii) reconstruct unobserved state variables from NS and NSMC data, formalizing the data as $\mathbf{y}^{o}(t)=\mathbf{y}(t)+\mathbf{e}(t)$. Across FN, protein transduction, Hes1, and Lorenz systems, SiGMoID consistently yields superior parameter recovery and accurate reconstruction of hidden dynamics, outperforming several existing approaches. The modular design and demonstrated scalability suggest broad applicability to data-limited domains such as virology, epidemiology, and cell signaling, enabling data-driven discovery under incomplete observations. By integrating physics-informed surrogates with distribution-matching generative models, this method provides a practical, robust tool for dynamic-system inference where traditional methods struggle with NSMC data.

Abstract

System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID) that enables precise and robust inference for dynamic systems. The proposed approach integrates two key methods: (1) physics-informed neural networks with hyper-networks that constructs an ODE solver, and (2) Wasserstein generative adversarial networks that estimates ODE parameters by effectively capturing noisy data distributions. We demonstrate that SiGMoID quantifies data noise, estimates system parameters, and infers unobserved system components. Its effectiveness is validated validated through realistic experimental examples, showcasing its broad applicability in various domains, from scientific research to engineered systems, and enabling the discovery of full system dynamics.

Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model

TL;DR

This work tackles robustly inferring parameters and recovering missing components in nonlinear dynamic systems when data are noisy, sparse, or partially observed. It introduces SiGMoID, a hybrid framework that combines a HyperPINN-based ODE solver with a Wasserstein-GAN to (i) quantify observation noise, (ii) estimate system parameters, and (iii) reconstruct unobserved state variables from NS and NSMC data, formalizing the data as . Across FN, protein transduction, Hes1, and Lorenz systems, SiGMoID consistently yields superior parameter recovery and accurate reconstruction of hidden dynamics, outperforming several existing approaches. The modular design and demonstrated scalability suggest broad applicability to data-limited domains such as virology, epidemiology, and cell signaling, enabling data-driven discovery under incomplete observations. By integrating physics-informed surrogates with distribution-matching generative models, this method provides a practical, robust tool for dynamic-system inference where traditional methods struggle with NSMC data.

Abstract

System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID) that enables precise and robust inference for dynamic systems. The proposed approach integrates two key methods: (1) physics-informed neural networks with hyper-networks that constructs an ODE solver, and (2) Wasserstein generative adversarial networks that estimates ODE parameters by effectively capturing noisy data distributions. We demonstrate that SiGMoID quantifies data noise, estimates system parameters, and infers unobserved system components. Its effectiveness is validated validated through realistic experimental examples, showcasing its broad applicability in various domains, from scientific research to engineered systems, and enabling the discovery of full system dynamics.

Paper Structure

This paper contains 18 sections, 2 theorems, 37 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that $\pi_{e}^{G},\pi_{e}^{o}$ satisfy Then, if $\pi_{p}^{G}$, $\pi_{e}^{G}$ are the desired probability density functions (pdfs) that make Wasserstein distance $d(Y^{o},Y^{G})$ equal to zero, i.e., then and for all $i \in I$, $j \in J$.

Figures (9)

  • Figure 1: Graphical illustration of the functioning of SiGMoID on NSMC data. (a) ODE system with observable and missing components (b) HyperPINN training from simulated parameter-solution pairs (c) Parameter estimation and missing data recovery using W-GAN
  • Figure 2: System inference for the FN equation. Demonstration of the capability of SiGMoID to infer true system solutions ($V$ and $R$) for the FN model. (a) NS datasets for $V$ and $R$ are provided (sample observations - blue and green dots). SiGMoID infers the true system solutions ($V$ (NS), $R$ (NS) - red dashed line) accurately. The red region represents the range of the solutions inferred using SiGMoID, while the blue region represents the $95\%$ confidence interval (CI) of the observed component over time. (b) The experimental dataset for $R$ is missing, while the analogue for $V$ is available (Sample observations - blue dots). Despite the absence of $R$ in the dataset, SiGMoID fits both the true $V$ and true $R$ successfully. The CI for the observed $R$ is omitted owing to the absence of any corresponding dataset.
  • Figure 3: System inference for a protein transduction model. We infer the true system solutions (true $S$, $S_{d}$, $R$, $S_{R}$, and $R_{pp}$) using SiGMoID. (a) When all the components are given in the NS dataset, the solutions inferred using SiGMoID match the true solutions accurately. (b) Similar to (a), the solutions inferred using SiGMoID match the true solutions accurately, even though the components, $S$, $S_{d}$, $R$, and $S_{R}$ are not observable (MC).
  • Figure 4: System inference for a Hes1 model. Datasets for the two components, $P$ and $M$, in the Hes1 model are provided as NS data, while the component, $H$, is unobserved (MC). SiGMoID accurately infers not only the true system solutions for all components but also captures the noise in the data ($95\%$ CI of obs. $P$ and $M$).
  • Figure 5: System inference for a Lorenz equation. We infer the true system solutions (true $X$, $Y$ and $Z$) using SiGMoID. (a) When all the components are given in the NS dataset, the solutions inferred using SiGMoID match the true solutions accurately. (b) Similar to (a), the solutions inferred using SiGMoID match the true solutions accurately, even though the components, $Y$ and $Z$ are not observable (MC).
  • ...and 4 more figures

Theorems & Definitions (5)

  • Proposition 1
  • proof
  • Corollary 1
  • Definition 1
  • Definition 2