Table of Contents
Fetching ...

A new local time-decoupled squared Wasserstein-2 method for training stochastic neural networks to reconstruct uncertain parameters in dynamical systems

Mingtao Xia, Qijing Shen, Philip Maini, Eamonn Gaffney, Alex Mogilner

TL;DR

The paper tackles reconstructing the distribution of uncertain dynamical parameters from time-series data by introducing a local time-decoupled squared $W_2$ loss that integrates time-based transport with initial-state uncertainty. It leverages an SNN with weight uncertainty to directly learn parameter distributions, and provides theoretical results proving the loss is well-defined and tied to the underlying parameter distribution via $W_2$ distances. The authors establish universal approximation properties for the SNN in the $W_2$ sense, including Gaussian-mixture capabilities, and validate the approach through numerical experiments on ODEs, PDEs, SDEs, and jump-diffusion models, outperforming several benchmarks. The work advances uncertainty quantification for inverse problems in dynamical systems by offering a principled, data-driven framework that does not require priors and is applicable across deterministic and stochastic settings, while also outlining avenues for refinement and extension.

Abstract

In this work, we propose and analyze a new local time-decoupled squared Wasserstein-2 method for reconstructing the distribution of unknown parameters in dynamical systems. Specifically, we show that a stochastic neural network model, which can be effectively trained by minimizing our proposed local time-decoupled squared Wasserstein-2 loss function, is an effective model for approximating the distribution of uncertain model parameters in dynamical systems. Through several numerical examples, we showcase the effectiveness of our proposed method in reconstructing the distribution of parameters in different dynamical systems.

A new local time-decoupled squared Wasserstein-2 method for training stochastic neural networks to reconstruct uncertain parameters in dynamical systems

TL;DR

The paper tackles reconstructing the distribution of uncertain dynamical parameters from time-series data by introducing a local time-decoupled squared loss that integrates time-based transport with initial-state uncertainty. It leverages an SNN with weight uncertainty to directly learn parameter distributions, and provides theoretical results proving the loss is well-defined and tied to the underlying parameter distribution via distances. The authors establish universal approximation properties for the SNN in the sense, including Gaussian-mixture capabilities, and validate the approach through numerical experiments on ODEs, PDEs, SDEs, and jump-diffusion models, outperforming several benchmarks. The work advances uncertainty quantification for inverse problems in dynamical systems by offering a principled, data-driven framework that does not require priors and is applicable across deterministic and stochastic settings, while also outlining avenues for refinement and extension.

Abstract

In this work, we propose and analyze a new local time-decoupled squared Wasserstein-2 method for reconstructing the distribution of unknown parameters in dynamical systems. Specifically, we show that a stochastic neural network model, which can be effectively trained by minimizing our proposed local time-decoupled squared Wasserstein-2 loss function, is an effective model for approximating the distribution of uncertain model parameters in dynamical systems. Through several numerical examples, we showcase the effectiveness of our proposed method in reconstructing the distribution of parameters in different dynamical systems.

Paper Structure

This paper contains 16 sections, 8 theorems, 177 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 2.1

Suppose are uniformly bounded, where $\bm{X}(t)$ and $\hat{\bm{X}}(t)$ are solutions to the ODEs ODE_model and approximate_ODE, respectively. Furthermore, we assume that $\bm{f}$ is continuous and uniformly bounded. Then, the limit on the RHS of Eq. local_define1 exists.

Figures (6)

  • Figure 1: A sketch of the structure of the neural network model with weight uncertainty used in xia2024local and in this paper. The weights $w_{i, j, k}\sim\mathcal{N}(a_{i, j, k}, \sigma_{i, j, k}^2)$ are independently sampled, i.e., $w_{i_1, j_1, k_1}$ is independent of $w_{i_2, j_2, k_2}$ when $(i_1, j_1, k_1)\neq (i_2, j_2, k_2)$. When using this neural network model to make predictions, for each input $\bm{x}=(x_1,\ldots,x_d)\in D\subseteq\mathbb{R}^d$, we resample all weights $\{w_{i, j, k}\}$ again. For each neuron in the hidden layer, one of the following three forward propagation methods is considered: the linear operation, the ReLU activation, or the ResNet technique.
  • Figure 2: (a) Ground truth (red dashed lines) prey population dynamics versus predicted prey population dynamics (blue solid lines) obtained with reconstructed predation rate $\hat{c}$. (b) Ground truth (red) predator population dynamics versus predicted predator population dynamics (blue) obtained with reconstructed predation rate $\hat{c}$. In (a) and (b), for clarity, we plot the first 50 groups of prey and predatory trajectories. Since the predation rate $c$ in Eq. \ref{['example1_model']} is sampled independently for each realization of the model Eq. \ref{['example1_model']}, the ground truth trajectories also form a distribution. (c) Ground truth $c\sim\mathcal{U}(2, 4)$ versus the distribution of the approximate $\hat{c}$ when minimizing different loss functions. The black horizontal line and the box indicate the median and the interquartile range of the ground truth or predicted predation rate. (d) Errors in the predicted mean $|\mathbb{E}[\hat{c}] - \mathbb{E}[c]|$ and predicted variance $|\text{Var}[\hat{c}] - \text{Var}[c]|$ when minimizing different loss functions. The errors are their averaged values over 5 independent experiments. In (c) and (d), "local $W_2$" refers to our local time-decoupled squared $W_2$ loss function Eq. \ref{['time_coupling0']} while "$W_2$" refers to previous time-decoupled squared $W_2$ loss function in xia2024squared.
  • Figure 3: (a) Ground truth $u_{n-1}(x, 2;\theta)$ (red dashed lines) versus reconstructed $\hat{u}_{n-1}(x, 2;\hat{\theta})$ (blue solid lines). For clarity, we only plot 50 ground truth $u_{n-1}(x,2;\theta)$ numerical solutions versus 50 approximate numerical solutions $\hat{u}_{n-1}(x,2;\hat{\theta})$ in Eq. \ref{['spectral_approx']}. (b) Mean and standard deviations of the ground truth $u_{N-1}(x, 2)$ versus reconstructed $\hat{u}_{n-1}(x, 2)$. (c) The ground truth $(c_1, c_2)$ versus reconstructed $(\hat{c}_1, \hat{c}_2)$ when $\beta=1, \sigma_1=0.15, \sigma_2=0.1, n=12$. In (a), (b), and (c), the parameters are $n=12$ and $\beta=1, \sigma_1=0.15, \sigma_2=0.1, \sigma_3=0.2, N=12$ and $\delta=0.1$ in the loss function Eq. \ref{['time_coupling0']}. (d) Errors in $(\hat{c}_1, \hat{c}_2)$ w.r.t. different variances $\sigma_1, \sigma_2$ for $(c_1, c_2)$ in Eq. \ref{['example2_model']} (Case 1 on Page 23). (e) Errors in $(\hat{c}_1, \hat{c}_2)$ w.r.t. different values of the variance $\sigma_3$ in the initial condition $\epsilon$ and different $\delta$. $\delta=\text{inf}$ indicates that we set $\delta=\infty$, which corresponds to the time-decoupled squared $W_2$ loss function in xia2024efficient (Case 2 on Page 23). (f) Errors in $(\hat{c}_1, \hat{c}_2)$ w.r.t. different values $N$ and $c_0$ (Case 3 on Page 23).
  • Figure 4: (a)-(d) The first 50 out of 400 trajectories of the four quantities $v_{\text{vit}}(t), r_{\text{vit}}(t), c_{\text{vit}}(t), h_{\text{vit}}(t)$ obtained with the parameter vector $\bm{k}$ sampled from the ground truth distribution Eq. \ref{['kinetic_model']} (red dashed lines) versus trajectories obtained by using the parameter vector $\bm{k}$ sampled from the reconstructed distribution generated by the trained SNN (blue solid lines). The SNN has 3 hidden layers and 10 neurons in each layer. ResNet is used for forward propagation, and the nodes and weights are initialized by independently sampling from $\mathcal{N}(0, 0.03^2)$. (e) The reconstructed joint distribution of any two kinetic parameters in Eq. \ref{['kinetic_model']}. In all subplots, the red dots are sampled from the ground truth joint distribution while the blue dots are sampled from the reconstructed distribution.
  • Figure 5: (a) Ground truth trajectories generated from Eq. \ref{['example4_model']} versus the reconstructed trajectories generated from the approximate Eq. \ref{['example4_model_approximate']}. For clarity, we plot 50 ground truth trajectories and 50 reconstructed trajectories. (b) The empirical probability density function ground truth $|s|$ versus the empirical probability density function of the reconstructed $|\hat{s}|$ in Eqs. \ref{['example4_model']} and \ref{['example4_model_approximate']}. (c) The empirical probability density function ground truth $\xi$ versus the empirical probability density function of the reconstructed $|\hat{\xi}|$ in Eqs. \ref{['example4_model']} and \ref{['example4_model_approximate']}. In (a)-(c), $\sigma_0=0.3, \beta_0 = 0.35, \sigma_1=0.15, \sigma_2=0.1$, and $\delta=0.1$ in the loss function Eq. \ref{['time_coupling0']}. (d) and (g) The errors in the reconstructed distribution of $\hat{\sigma}$ and $\xi$ for case 1 on Page 30, respectively. (e) and (h) The errors in the reconstructed distribution of $\hat{\sigma}$ and $\xi$ for case 2 on Page 30, respectively. (f) and (i) The errors in the reconstructed distribution of $\hat{\sigma}$ and $\xi$ for case 3 on Page 30, respectively.
  • ...and 1 more figures

Theorems & Definitions (17)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 2.1
  • Corollary 2.1
  • Theorem 2.2
  • Theorem 3.1
  • Lemma 3.1
  • proof
  • Corollary 3.1
  • ...and 7 more