Table of Contents
Fetching ...

Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification

Soban Nasir Lone, Subhayan De, Rajdip Nayek

TL;DR

This work tackles uncertainty quantification in neural operator learning by replacing KL-based variational inference with Generalized Variational Inference using Rényi's $α$-divergence in DeepONets. The α-VI DeepONet employs Bayesian neural network blocks for both branch and trunk networks, enabling robust posterior estimates even under prior misspecification. Across four mechanical PDE/ODE problems, and under out-of-distribution and noisy-data scenarios, the method yields improved NMSE and competitive or better NLL compared to deterministic DeepONet and KL-VI baselines, with performance sensitively controlled by the hyperparameter $α$. The study demonstrates that α offers a tunable balance between mass-covering and mode-seeking behavior, enabling problem-specific robustness and uncertainty calibration, and it discusses practical extensions to richer posterior families and broader operator-learning frameworks.

Abstract

We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) using Rényi's $α$-divergence to learn complex operators while quantifying uncertainty. By incorporating Bayesian neural networks as the building blocks for the branch and trunk networks, our framework endows DeepONet with uncertainty quantification. The use of Rényi's $α$-divergence, instead of the Kullback-Leibler divergence (KLD), commonly used in standard variational inference, mitigates issues related to prior misspecification that are prevalent in Variational Bayesian DeepONets. This approach offers enhanced flexibility and robustness. We demonstrate that modifying the variational objective function yields superior results in terms of minimising the mean squared error and improving the negative log-likelihood on the test set. Our framework's efficacy is validated across various mechanical systems, where it outperforms both deterministic and standard KLD-based VI DeepONets in predictive accuracy and uncertainty quantification. The hyperparameter $α$, which controls the degree of robustness, can be tuned to optimise performance for specific problems. We apply this approach to a range of mechanics problems, including gravity pendulum, advection-diffusion, and diffusion-reaction systems. Our findings underscore the potential of $α$-VI DeepONet to advance the field of data-driven operator learning and its applications in engineering and scientific domains.

Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification

TL;DR

This work tackles uncertainty quantification in neural operator learning by replacing KL-based variational inference with Generalized Variational Inference using Rényi's -divergence in DeepONets. The α-VI DeepONet employs Bayesian neural network blocks for both branch and trunk networks, enabling robust posterior estimates even under prior misspecification. Across four mechanical PDE/ODE problems, and under out-of-distribution and noisy-data scenarios, the method yields improved NMSE and competitive or better NLL compared to deterministic DeepONet and KL-VI baselines, with performance sensitively controlled by the hyperparameter . The study demonstrates that α offers a tunable balance between mass-covering and mode-seeking behavior, enabling problem-specific robustness and uncertainty calibration, and it discusses practical extensions to richer posterior families and broader operator-learning frameworks.

Abstract

We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) using Rényi's -divergence to learn complex operators while quantifying uncertainty. By incorporating Bayesian neural networks as the building blocks for the branch and trunk networks, our framework endows DeepONet with uncertainty quantification. The use of Rényi's -divergence, instead of the Kullback-Leibler divergence (KLD), commonly used in standard variational inference, mitigates issues related to prior misspecification that are prevalent in Variational Bayesian DeepONets. This approach offers enhanced flexibility and robustness. We demonstrate that modifying the variational objective function yields superior results in terms of minimising the mean squared error and improving the negative log-likelihood on the test set. Our framework's efficacy is validated across various mechanical systems, where it outperforms both deterministic and standard KLD-based VI DeepONets in predictive accuracy and uncertainty quantification. The hyperparameter , which controls the degree of robustness, can be tuned to optimise performance for specific problems. We apply this approach to a range of mechanics problems, including gravity pendulum, advection-diffusion, and diffusion-reaction systems. Our findings underscore the potential of -VI DeepONet to advance the field of data-driven operator learning and its applications in engineering and scientific domains.
Paper Structure (24 sections, 34 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 34 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Architecture of the Deterministic DeepONet Model. The figure illustrates the deep neural network architecture of the deterministic DeepONet, showcasing the branch network that processes the input function and the trunk network that processes the coordinates, culminating in a point estimate of the network output.
  • Figure 2: Architecture of the $\alpha$-VI DeepONet Model. The figure depicts the deep neural network architecture of the $\alpha$-VI DeepONet, where both the branch and trunk networks are replaced with Bayesian neural networks. The output is a random variable characterised by a mean and a standard deviation, providing a probabilistic representation of the response.
  • Figure 3: The figure depicts the variation of Rényi's $\alpha$-divergence $D_{AR}^{(\alpha)}$ between two normal distributions for different values of $\alpha$. We take $p(\theta)$ to be a standard normal distribution and $q(\theta)$ to be a normal distribution with a randomly selected mean and standard deviation.
  • Figure 4: Predictive performance comparison for the antiderivative operator. The figure presents a comparative visualisation of the output function predictions generated by $\alpha$-VI DeepONet and D-DeepONet for two representative test cases (columns (a) and (b)). Each column displays results for a distinct test input function sample. The top row illustrates mean predictions and corresponding 95% confidence intervals (CIs) from $\alpha$-VI DeepONet with $\alpha = 1.25$, which demonstrates optimal performance for this problem. The second row presents predictions from standard-VI DeepONet by setting $\alpha = 1.00$. The bottom row showcases predictions from the deterministic D-DeepONet model.
  • Figure 5: Predictive performance comparison for the gravity pendulum. The figure presents a comparative visualisation of the output function predictions generated by $\alpha$-VI DeepONet and D-DeepONet for two representative test cases (columns (a) and (b)) . Each column displays results for a distinct test input function sample. The top row illustrates mean predictions and corresponding 95% confidence intervals (CIs) from $\alpha$-VI DeepONet with $\alpha = 2.00$, which demonstrates optimal mean prediction performance for this problem. The second row presents predictions from standard-VI DeepONet by setting $\alpha = 1.00$. The bottom row showcases predictions from the deterministic D-DeepONet model.
  • ...and 5 more figures