Table of Contents
Fetching ...

Condensed Stein Variational Gradient Descent for Uncertainty Quantification of Neural Networks

Govinda Anantha Padmanabha, Cosmin Safta, Nikolaos Bouklas, Reese E. Jones

TL;DR

This work targets uncertainty quantification for highly parameterized neural networks by introducing condensed SVGD (cSVGD), which couples concurrent sparsification with Stein gradient flow and a graph reconciliation procedure. By using sparsifying priors (e.g., |θ|^α with multiplier λ) and kernel-based repulsion, the method evolves a particle ensemble that encodes a posterior over parameters while aligning ensemble representations on a common graph, reducing degeneracy and computational cost. Applied to a physics-informed ICNN for hyperelastic constitutive modeling, cSVGD achieves drastic reductions in active parameters (from 1020 to a few hundred) without sacrificing accuracy, and an adaptive penalty further improves sparsity-accuracy tradeoffs. The approach advances scalable Bayesian inference for scientific neural networks by exploiting parameter fungibility and graph-structured alignment to enable interpretable, efficient parameter UQ in large-scale, physics-guided models.

Abstract

We propose a Stein variational gradient descent method to concurrently sparsify, train, and provide uncertainty quantification of a complexly parameterized model such as a neural network. It employs a graph reconciliation and condensation process to reduce complexity and increase similarity in the Stein ensemble of parameterizations. Therefore, the proposed condensed Stein variational gradient (cSVGD) method provides uncertainty quantification on parameters, not just outputs. Furthermore, the parameter reduction speeds up the convergence of the Stein gradient descent as it reduces the combinatorial complexity by aligning and differentiating the sensitivity to parameters. These properties are demonstrated with an illustrative example and an application to a representation problem in solid mechanics.

Condensed Stein Variational Gradient Descent for Uncertainty Quantification of Neural Networks

TL;DR

This work targets uncertainty quantification for highly parameterized neural networks by introducing condensed SVGD (cSVGD), which couples concurrent sparsification with Stein gradient flow and a graph reconciliation procedure. By using sparsifying priors (e.g., |θ|^α with multiplier λ) and kernel-based repulsion, the method evolves a particle ensemble that encodes a posterior over parameters while aligning ensemble representations on a common graph, reducing degeneracy and computational cost. Applied to a physics-informed ICNN for hyperelastic constitutive modeling, cSVGD achieves drastic reductions in active parameters (from 1020 to a few hundred) without sacrificing accuracy, and an adaptive penalty further improves sparsity-accuracy tradeoffs. The approach advances scalable Bayesian inference for scientific neural networks by exploiting parameter fungibility and graph-structured alignment to enable interpretable, efficient parameter UQ in large-scale, physics-guided models.

Abstract

We propose a Stein variational gradient descent method to concurrently sparsify, train, and provide uncertainty quantification of a complexly parameterized model such as a neural network. It employs a graph reconciliation and condensation process to reduce complexity and increase similarity in the Stein ensemble of parameterizations. Therefore, the proposed condensed Stein variational gradient (cSVGD) method provides uncertainty quantification on parameters, not just outputs. Furthermore, the parameter reduction speeds up the convergence of the Stein gradient descent as it reduces the combinatorial complexity by aligning and differentiating the sensitivity to parameters. These properties are demonstrated with an illustrative example and an application to a representation problem in solid mechanics.

Paper Structure

This paper contains 16 sections, 36 equations, 13 figures, 2 algorithms.

Figures (13)

  • Figure 1: Log posteriors formed from a Gaussian likelihood and priors from the $\alpha$ exponential family scaled by $\lambda$. As $\lambda$ is increased (top to bottom), the posterior resembles the prior mode more than the likelihood, and as $\alpha$ is reduced (right to left), the contours of the prior transform from diamond-shaped to more cruciform-shaped. Note the values of $\lambda$ are relative to the particular likelihood chosen for this illustration.
  • Figure 2: Illustration of the Stein gradient (red arrows) and pseudopotential (contours) for $\alpha=1$ prior and $\beta=2$ kernel, one particle (blue dot) in the field of another (green X) which is fixed: (left) fixed particle is at the origin, (middle) fixed particle is near the origin, (right) fixed particle is away from the origin. The multiplier $\lambda = 1$ and the bandwidth $\gamma = 1/10$.
  • Figure 3: Error distribution (a, Bhatacharyya distance between the posterior and the true distribution) and sparsity (b, $L_1$ norm of $\mathsf{w}_3$) with respect to increasing multiplier $\lambda$. Colors indicate kernel bandwidth $\gamma$. Each panel represents a different choice of prior order $\alpha$ and kernel order $\beta$.
  • Figure 4: Converged particles for various bandwidths using a $\alpha = 1$ prior with multiplier $\lambda = 0.1$ (a,left) and $\lambda = 1.0$ (b,right). Colors indicate kernel bandwidth.
  • Figure 5: Sequence of graphs for 3 randomly selected particles during the iteration process. Initial models are on the left, and the final on the right. Nodes are colored by importance Eq. (\ref{['eq:importance']}), and edges are colored by weight $W_{ij}$. All plotted values are normalized per layer.
  • ...and 8 more figures