Table of Contents
Fetching ...

Uncertainty quantification of neural network models of evolving processes via Langevin sampling

Cosmin Safta, Reese E. Jones, Ravi G. Patel, Raelynn Wonnacot, Dan S. Bolintineanu, Craig M. Hamel, Sharlotte L. B. Kramer

TL;DR

This work addresses uncertainty quantification for history-dependent processes modeled by neural ordinary differential equations, coupling a data model to a trainable weight sampler within a differentiable hypernetwork. It uses Langevin sampling to draw posterior weight ensembles with a learnable score-based drift, enabling flexible trade-offs between data-model cost and posterior accuracy, and combines this with an ELBO objective and a pathwise KL bound. The method is demonstrated on chemical kinetics and material physics problems, showing advantages over variational inference and enabling efficient Bayesian last-layer variants. The approach supports epistemic and aleatory uncertainty quantification and is designed to be integrable with larger simulation workflows and experimental design settings.

Abstract

We propose a scalable, approximate inference hypernetwork framework for a general model of history-dependent processes. The flexible data model is based on a neural ordinary differential equation (NODE) representing the evolution of internal states together with a trainable observation model subcomponent. The posterior distribution corresponding to the data model parameters (weights and biases) follows a stochastic differential equation with a drift term related to the score of the posterior that is learned jointly with the data model parameters. This Langevin sampling approach offers flexibility in balancing the computational budget between the evaluation cost of the data model and the approximation of the posterior density of its parameters. We demonstrate performance of the ensemble sampling hypernetwork on chemical reaction and material physics data and compare it to standard variational inference.

Uncertainty quantification of neural network models of evolving processes via Langevin sampling

TL;DR

This work addresses uncertainty quantification for history-dependent processes modeled by neural ordinary differential equations, coupling a data model to a trainable weight sampler within a differentiable hypernetwork. It uses Langevin sampling to draw posterior weight ensembles with a learnable score-based drift, enabling flexible trade-offs between data-model cost and posterior accuracy, and combines this with an ELBO objective and a pathwise KL bound. The method is demonstrated on chemical kinetics and material physics problems, showing advantages over variational inference and enabling efficient Bayesian last-layer variants. The approach supports epistemic and aleatory uncertainty quantification and is designed to be integrable with larger simulation workflows and experimental design settings.

Abstract

We propose a scalable, approximate inference hypernetwork framework for a general model of history-dependent processes. The flexible data model is based on a neural ordinary differential equation (NODE) representing the evolution of internal states together with a trainable observation model subcomponent. The posterior distribution corresponding to the data model parameters (weights and biases) follows a stochastic differential equation with a drift term related to the score of the posterior that is learned jointly with the data model parameters. This Langevin sampling approach offers flexibility in balancing the computational budget between the evaluation cost of the data model and the approximation of the posterior density of its parameters. We demonstrate performance of the ensemble sampling hypernetwork on chemical reaction and material physics data and compare it to standard variational inference.

Paper Structure

This paper contains 21 sections, 61 equations, 14 figures, 1 algorithm.

Figures (14)

  • Figure 1: Schematics: (a) hypernetwork consisting of subcomponents: (b) sampler $\mathsf{w} \sim \mathsf{q}(\mathsf{w} | \mathsf{m}, \mathsf{D} ; \boldsymbol{\theta})$, and (c) data model $\mathsf{Y} = \mathsf{m}(\mathsf{X}; \mathsf{w})$
  • Figure 2: Langevin dynamics: (a) data density, (b) predicted density via Langevin sampling (Algorithm \ref{['alg:training']}), and (c) predicted density via BBVI. The mean trend is shown with a red line.
  • Figure 3: Langevin model: kernel density estimates of 1D marginal distributions and 2D joint densities for model parameters $(W,b)$. The dashed red and blue lines shown in the 2D density plots represent the first principal vector constructed from available $(W,b)$ for BBVI (upper right panel) and Langevin sampling (lower left panel), respectively, and the dashed black line represents the theoretical solution (in both off diagonal panels).
  • Figure 4: Langevin dynamics: Wasserstein $\mathcal{W}_1$ distance as function of time and epochs for sequence of samples.
  • Figure 5: Schlögl reaction: trajectories (upper panels) and density (lower panels).
  • ...and 9 more figures