Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

Govinda Anantha Padmanabha; Jan Niklas Fuhg; Cosmin Safta; Reese E. Jones; Nikolaos Bouklas

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

Govinda Anantha Padmanabha, Jan Niklas Fuhg, Cosmin Safta, Reese E. Jones, Nikolaos Bouklas

TL;DR

Uncertainty quantification for high-dimensional neural networks is hindered by the curse of dimensionality. The paper proposes an $L_0$ sparsification prior applied before Stein variational gradient descent ($L_0$+SVGD) to create a compact nonlinear parameter manifold and perform nonparametric UQ on this manifold. Across hyperelasticity and mechanochemistry, $L_0$+SVGD achieves accurate predictive distributions with far fewer parameters and lower computational cost than dense SVGD, $p$SVGD$, or HMC, while preserving physical structure via ICNN-based polyconvex constraints. This framework offers a robust, efficient approach for physics-informed uncertainty quantification and enables potential integration with large-scale finite element workflows and active-learning strategies.

Abstract

Most scientific machine learning (SciML) applications of neural networks involve hundreds to thousands of parameters, and hence, uncertainty quantification for such models is plagued by the curse of dimensionality. Using physical applications, we show that $L_0$ sparsification prior to Stein variational gradient descent ($L_0$+SVGD) is a more robust and efficient means of uncertainty quantification, in terms of computational cost and performance than the direct application of SGVD or projected SGVD methods. Specifically, $L_0$+SVGD demonstrates superior resilience to noise, the ability to perform well in extrapolated regions, and a faster convergence rate to an optimal solution.

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

TL;DR

Uncertainty quantification for high-dimensional neural networks is hindered by the curse of dimensionality. The paper proposes an

sparsification prior applied before Stein variational gradient descent (

+SVGD) to create a compact nonlinear parameter manifold and perform nonparametric UQ on this manifold. Across hyperelasticity and mechanochemistry,

+SVGD achieves accurate predictive distributions with far fewer parameters and lower computational cost than dense SVGD,

SVGD$, or HMC, while preserving physical structure via ICNN-based polyconvex constraints. This framework offers a robust, efficient approach for physics-informed uncertainty quantification and enables potential integration with large-scale finite element workflows and active-learning strategies.

Abstract

sparsification prior to Stein variational gradient descent (

+SVGD) is a more robust and efficient means of uncertainty quantification, in terms of computational cost and performance than the direct application of SGVD or projected SGVD methods. Specifically,

+SVGD demonstrates superior resilience to noise, the ability to perform well in extrapolated regions, and a faster convergence rate to an optimal solution.

Paper Structure (12 sections, 42 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 12 sections, 42 equations, 11 figures, 1 table, 2 algorithms.

Introduction
Related work
Methods
Bayesian calibration
Smoothed $L_0$ sparsification
Stein variational inference
Results
Hyperelasticity
Mechanochemistry
Conclusion
Stein Variational Gradient Descent
Input convex neural network

Figures (11)

Figure 1: Fits for $L_0$, $L_1$ and $L_2$ (left to right) regularization. Potential $\Psi$ (upper panels, compared to noiseless data) and stress $\mathbf{S}$ (lower panels, compared to 10% noisy data). The total number of parameters for the $L_0$, $L_1$ and $L_2$ fits are 7, 102 and 1005, respectively.
Figure 2: Comparison of the L-curves for $L_{0}$, $L_{1}$ and $L_{2}$ regularizations and increasing amounts of additive noise using the test R$^{2}$ score. The optimal penalty $\lambda$ depends strongly on the normalization but not as much on the added noise over the range that was studied.
Figure 3: Comparison of Wasserstein-1 distances between the $L_2$+Stein, $L_2$+projected Stein and $L_0$+Stein for noisy (left, 10% heteroskedastic noise) and clean (right) data. Note the similar trends for these two cases.
Figure 4: Comparison of mean and standard deviation predicted by $L_0$+Stein pushforward samples and 10 % heteroskedastic noisy validation data. Color bands indicate $\pm 2$ standard deviation from the mean, which largely overlap the noisy data.
Figure 5: Convergence of Wasserstein-1 distances for $L_0+$Stein results and the 10 % heteroskedastic noise validation data distribution with an increasing number of data size $N_D$.
...and 6 more figures

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

TL;DR

Abstract

Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)