Table of Contents
Fetching ...

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

Govinda Anantha Padmanabha, Jan Niklas Fuhg, Cosmin Safta, Reese E. Jones, Nikolaos Bouklas

TL;DR

Uncertainty quantification for high-dimensional neural networks is hindered by the curse of dimensionality. The paper proposes an $L_0$ sparsification prior applied before Stein variational gradient descent ($L_0$+SVGD) to create a compact nonlinear parameter manifold and perform nonparametric UQ on this manifold. Across hyperelasticity and mechanochemistry, $L_0$+SVGD achieves accurate predictive distributions with far fewer parameters and lower computational cost than dense SVGD, $p$SVGD$, or HMC, while preserving physical structure via ICNN-based polyconvex constraints. This framework offers a robust, efficient approach for physics-informed uncertainty quantification and enables potential integration with large-scale finite element workflows and active-learning strategies.

Abstract

Most scientific machine learning (SciML) applications of neural networks involve hundreds to thousands of parameters, and hence, uncertainty quantification for such models is plagued by the curse of dimensionality. Using physical applications, we show that $L_0$ sparsification prior to Stein variational gradient descent ($L_0$+SVGD) is a more robust and efficient means of uncertainty quantification, in terms of computational cost and performance than the direct application of SGVD or projected SGVD methods. Specifically, $L_0$+SVGD demonstrates superior resilience to noise, the ability to perform well in extrapolated regions, and a faster convergence rate to an optimal solution.

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

TL;DR

Uncertainty quantification for high-dimensional neural networks is hindered by the curse of dimensionality. The paper proposes an sparsification prior applied before Stein variational gradient descent (+SVGD) to create a compact nonlinear parameter manifold and perform nonparametric UQ on this manifold. Across hyperelasticity and mechanochemistry, +SVGD achieves accurate predictive distributions with far fewer parameters and lower computational cost than dense SVGD, SVGD$, or HMC, while preserving physical structure via ICNN-based polyconvex constraints. This framework offers a robust, efficient approach for physics-informed uncertainty quantification and enables potential integration with large-scale finite element workflows and active-learning strategies.

Abstract

Most scientific machine learning (SciML) applications of neural networks involve hundreds to thousands of parameters, and hence, uncertainty quantification for such models is plagued by the curse of dimensionality. Using physical applications, we show that sparsification prior to Stein variational gradient descent (+SVGD) is a more robust and efficient means of uncertainty quantification, in terms of computational cost and performance than the direct application of SGVD or projected SGVD methods. Specifically, +SVGD demonstrates superior resilience to noise, the ability to perform well in extrapolated regions, and a faster convergence rate to an optimal solution.
Paper Structure (12 sections, 42 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 12 sections, 42 equations, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: Fits for $L_0$, $L_1$ and $L_2$ (left to right) regularization. Potential $\Psi$ (upper panels, compared to noiseless data) and stress $\mathbf{S}$ (lower panels, compared to 10% noisy data). The total number of parameters for the $L_0$, $L_1$ and $L_2$ fits are 7, 102 and 1005, respectively.
  • Figure 2: Comparison of the L-curves for $L_{0}$, $L_{1}$ and $L_{2}$ regularizations and increasing amounts of additive noise using the test R$^{2}$ score. The optimal penalty $\lambda$ depends strongly on the normalization but not as much on the added noise over the range that was studied.
  • Figure 3: Comparison of Wasserstein-1 distances between the $L_2$+Stein, $L_2$+projected Stein and $L_0$+Stein for noisy (left, 10% heteroskedastic noise) and clean (right) data. Note the similar trends for these two cases.
  • Figure 4: Comparison of mean and standard deviation predicted by $L_0$+Stein pushforward samples and 10 % heteroskedastic noisy validation data. Color bands indicate $\pm 2$ standard deviation from the mean, which largely overlap the noisy data.
  • Figure 5: Convergence of Wasserstein-1 distances for $L_0+$Stein results and the 10 % heteroskedastic noise validation data distribution with an increasing number of data size $N_D$.
  • ...and 6 more figures