Table of Contents
Fetching ...

A local squared Wasserstein-2 method for efficient reconstruction of models with uncertainty

Mingtao Xia, Qijing Shen

TL;DR

The effectiveness of the proposed local squared Wasserstein-2 method is demonstrated across several uncertainty quantification tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.

Abstract

In this paper, we propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters. A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models. Instead, our method can efficiently reconstruct the distributions of the output associated with different inputs based on empirical distributions of observation data. We demonstrate the effectiveness of our proposed method across several uncertainty quantification (UQ) tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.

A local squared Wasserstein-2 method for efficient reconstruction of models with uncertainty

TL;DR

The effectiveness of the proposed local squared Wasserstein-2 method is demonstrated across several uncertainty quantification tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.

Abstract

In this paper, we propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters. A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models. Instead, our method can efficiently reconstruct the distributions of the output associated with different inputs based on empirical distributions of observation data. We demonstrate the effectiveness of our proposed method across several uncertainty quantification (UQ) tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.
Paper Structure (17 sections, 2 theorems, 68 equations, 6 figures, 2 tables)

This paper contains 17 sections, 2 theorems, 68 equations, 6 figures, 2 tables.

Key Result

Theorem 4.3

For each $x\in D$, we denote the number of samples $(\tilde{\bm{x}}, \tilde{\bm{y}})\in S$ such that $|\tilde{\bm{x}}-\bm{x}|_x\leq\delta$ to be $N(\bm{x}, \delta)$. We denote the total number of samples of the empirical distribution to be $N$. Assuming that each input $\bm{x}$ is independently samp where $\tilde{W}_{2, \delta}^{2, \text{e}}(\bm{y}, \hat{\bm{y}})$ is the local $W_2$ distance defin

Figures (6)

  • Figure 1: (a) The predicted $\hat{\bm{y}}(\bm{x};\omega)$ versus the ground truth $\bm{y}(\bm{x}, \hat{\omega})$. To illustrate, we take $\bm{x}$ on the line $\bm{x}=(x_0, x_0, x_0)$ and choose different values of $x_0=-0.3+0.1i, i=0,\ldots,9$. At each $\bm{x}$, we independently sample 100 $\omega=(\omega_1, \omega_2, \omega_3, \omega_4)$ in Eq. \ref{['example2_model']} as well as $\hat{\omega}=(\hat{\omega}_1, \hat{\omega}_2, \hat{\omega}_3, \hat{\omega}_4)$ in Eq. \ref{['example2_approx']} and plot 100 ground truth $y(\bm{x};\omega)$ versus 100 predicted $\hat{y}(\bm{x};\hat{\omega})$. (b) The average relative errors in $\hat{b}_i$ and $\hat{\sigma}_i$ w.r.t. the size of neighborhood $\delta$ when using the two different norms of the input $\bm{x}$: $|\bm{x}|_{\text{homo}}$ and $|\bm{x}|_{\text{hete}}$. (c) The average relative errors in $\hat{b}_i$ and $\hat{\sigma}_i$ w.r.t. the number of training samples $N$. In (c), the norm for $\bm{x}$ is $|\bm{x}|_{\text{hete}}$ (defined in Eq. \ref{['norm_def']}) and the size of neighborhood$\delta=0.1$ .
  • Figure 2: (a)-(i) The ground-truth $y(x;\omega)$ plotted against the predicted $\hat{y}(x;\hat{\omega})$ on the testing set. The predicted $\hat{y}$ is obtained by minimizing different loss functions (defined in Supplement \ref{['def_loss']} and obtained by using the BNN method. (j) The average errors in the mean and the standard deviations of $\hat{y}$ on the testing set obtained by minimizing different loss functions and obtained by using the BNN method. The neural network model with weight uncertainty (Fig. \ref{['fig:nn_model']}) trained by minimizing our local squared $W_2$ loss yields the smallest errors among all methods. Minimizing the local MMD is comparable to minimizing the local squared $W_2$ loss, likely because the MMD could also somehow measure the discrepancy between two probability distributions. However, unlike the analysis of our local squared $W_2$ method in Subsection \ref{['W2_loss']}, there is no theoretical guarantee explaining why the local MMD loss could be successful.
  • Figure 3: (a) The predicted $\hat{\bm{y}}(\tilde{\bm{x}}; \hat{\omega})$ using the neural network with weight uncertainty in Fig. \ref{['fig:nn_model']} versus the ground truth $\bm{y}(\tilde{\bm{x}};\omega)$ for 10 randomly selected samples $(\bm{x}, y)$ in the testing set $T$ satisfying $|\{\tilde{\bm{x}}\in T: |\tilde{\bm{x}} - \bm{x}|_{x}\leq\delta_0 \} |\geq 5$. (b) The predicted means $\mathrm{E}[\hat{y}(\tilde{\bm{x}};\hat{\omega})||\tilde{\bm{x}}-\bm{x}|_{x}\leq\delta_0]$ and the predicted standard deviations $\text{SD}[\hat{y}(\tilde{\bm{x}};\hat{\omega})||\tilde{\bm{x}}-\bm{x}|_{x}\leq\delta_0]$ using the neural network with weight uncertainty model in Fig. \ref{['fig:nn_model']} versus the ground truth mean $\mathrm{E}[y(\tilde{\bm{x}};\omega)||\tilde{\bm{x}}-\bm{x}|_{x}\leq\delta_0]$ and standard deviation $\text{SD}[y(\tilde{\bm{x}};\omega)||\tilde{\bm{x}}-\bm{x}|_{x}\leq\delta_0]$ for 10 randomly selected samples $(\bm{x}, y)$ in the testing set $T$ satisfying $|\{\tilde{\bm{x}}\in T: |\tilde{\bm{x}} - \bm{x}|_{x}\leq\delta_0 \} |\geq 5$. In both (a) and (b), we use the neural network with weight uncertainty model trained by minimizing the local squared $W_2$ loss Eq. \ref{['localw2']} with $\delta=0.05$ ($\delta$ is the size of the neighborhood in the loss function and is different from $\delta_0=0.2$). (c) The average relative errors in the mean and standard deviation of predictions $\hat{y}$ generated by the neural network model with weight uncertainty, trained by minimizing the local $W_2$ loss Eq. \ref{['localw2']} versus the average relative errors in the mean and standard deviation of predictions $\hat{y}$ generated by the neural network model without weight uncertainty, trained by minimizing the MSE loss. Note that in (c), the size of neighborhood $\delta$ only applies to using the neural network model with weight uncertainty trained by minimizing the local $W_2$ loss Eq. \ref{['localw2']}. Thus, when using the neural network model without weight uncertainty trained by minimizing the MSE loss, the results do not change with $\delta$.
  • Figure 4: (a)-(d) Comparison between the ground truth $y_i(\bm{y}_0, t;\omega)$ and the predicted $\hat{y}_i(\bm{y}_0,t;\hat{\omega}), i=1,2,3,4$ when the standard deviation of the initial condition $a=0$ and $\sigma=0.25$ in the distribution of the model parameter $\omega$. (e) Comparison between the ground truth $y_1(\bm{y}_0, t;\omega)$ and the predicted $\hat{y}_1(\bm{y}_0,t;\hat{\omega})$ ($a=0.3, \sigma=0.25$). (f) The ground truth $y_1(\bm{y}_0, t;\omega)$ versus the predicted $\hat{y}_1(\bm{y}_0,t;\hat{\omega})$ ($a=0, \sigma=0.4$). In (a)-(f), the ground truth $y_i(\bm{y}_0, t;\omega)$ are trajectories in the testing set and the predicted $\hat{y}_i(\bm{y}_0, t;\omega)$ are generated based on the initial conditions in the testing set (the testing set and the training set share the same $a, \sigma$). (g) Means and standard deviations of the ground truth $g_1(\bm{y}, \omega)$ versus the predicted $\hat{g}_1(\bm{y}, \hat{\omega})$ ($a=0, \sigma=0.25$). (h) Means and standard deviations of the ground truth $g_1(\bm{y}, \hat{\omega})$ versus the predicted $\hat{g}_1(\bm{y}, \hat{\omega})$ ($a=0, \sigma=0.4$). (i) Means and standard deviations of the ground truth $g_1(\bm{y}, \omega)$ versus the predicted $\hat{g}_1(\bm{y}, \hat{\omega})$ ($a=0.3,\sigma=0.25$). In (g)-(i), we let $\bm{y}=(1-z, 1+z, 1-z, 1+z), z=0.05i, i=0,\ldots,10$. (j) The errors $\frac{\tilde{W}_2^2(\bm{y}(\bm{y}_0, t;\omega), \hat{\bm{y}}(\bm{y}_0, t; \hat{\omega}))}{\mathrm{E}[\|\bm{y}(\bm{y}_0, t;\omega)\|^2]}$ and $\frac{\mathrm{E}[W_2^2(\eta_{\bm{y}(t), t}, \hat{\eta}_{\bm{y}(t), t})]}{\mathrm{E}[\|\bm{g}(\bm{y}(t), t,\omega)\|^2]}$ in $\hat{\bm{y}}$ and $\hat{\bm{g}}$ at different time $t$ when $a=0, \sigma=0.25$. $\eta_{\bm{y}, s}$ and $\hat{\eta}_{\bm{y}, s}$ denote the distributions of $\bm{g}(\bm{y}, s, \omega)$ and $\hat{\bm{g}}(\bm{y}, s, \hat{\omega})$, respectively. (k) Errors in $\hat{\bm{y}}$ (defined in Eq. \ref{['ode_error']}) for different $a$ and $\sigma$ . (l) Errors in $\hat{\bm{g}}$ and $\hat{\bm{y}}$ (defined in Eq. \ref{['ode_error']}) for different $a$ and $\sigma$. The errors of $\hat{\bm{y}}$ and $\hat{\bm{g}}$ are evaluated on the testing sets.
  • Figure 5: A sketch of the structure of the neural network model with weight uncertainty used in this paper. The weights $w_{i, j, k}\sim\mathcal{N}(a_{i, j, k}, \sigma_{i, j, k}^2)$ are independently sampled, i.e., $w_{i_1, j_1, k_1}$ is independent of $w_{i_2, j_2, k_2}$ when $(i_1, j_1, k_1)\neq (i_2, j_2, k_2)$. When using this neural network model to make predictions, for each input $\bm{x}=(x_1,\ldots,x_d)\in D\subseteq\mathbb{R}^d$, we shall resample all weights $\{w_{i, j, k}\}$ again. Either the normal feed-forward structure for forward propagation or the ResNet technique he2016deep for forward propagation is adopted.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 4.1
  • Theorem 4.3
  • Proposition S6.1
  • proof