Table of Contents
Fetching ...

BI-EqNO: Generalized Approximate Bayesian Inference with an Equivariant Neural Operator Framework

Xu-Hui Zhou, Zhuo-Ran Liu, Heng Xiao

TL;DR

BI-EqNO is introduced, an equivariant neural operator framework for generalized approximate Bayesian inference designed to enhance both deterministic and stochastic approaches, and its versatility and effectiveness are highlighted.

Abstract

Bayesian inference offers a robust framework for updating prior beliefs based on new data using Bayes' theorem, but exact inference is often computationally infeasible, necessitating approximate methods. Though widely used, these methods struggle to estimate marginal likelihoods accurately, particularly due to the rigid functional structures of deterministic models like Gaussian processes and the limitations of small sample sizes in stochastic models like the ensemble Kalman method. In this work, we introduce BI-EqNO, an equivariant neural operator framework for generalized approximate Bayesian inference, designed to enhance both deterministic and stochastic approaches. BI-EqNO transforms priors into posteriors conditioned on observation data through data-driven training. The framework is flexible, supporting diverse prior and posterior representations with arbitrary discretizations and varying numbers of observations. Crucially, BI-EqNO's architecture ensures (1) permutation equivariance between prior and posterior representations, and (2) permutation invariance with respect to observational data. We demonstrate BI-EqNO's utility through two examples: (1) as a generalized Gaussian process (gGP) for regression, and (2) as an ensemble neural filter (EnNF) for sequential data assimilation. Results show that gGP outperforms traditional Gaussian processes by offering a more flexible representation of covariance functions. Additionally, EnNF not only outperforms the ensemble Kalman filter in small-ensemble settings but also has the potential to function as a "super" ensemble filter, capable of representing and integrating multiple ensemble filters for enhanced assimilation performance. This study highlights BI-EqNO's versatility and effectiveness, improving Bayesian inference through data-driven training while reducing computational costs across various applications.

BI-EqNO: Generalized Approximate Bayesian Inference with an Equivariant Neural Operator Framework

TL;DR

BI-EqNO is introduced, an equivariant neural operator framework for generalized approximate Bayesian inference designed to enhance both deterministic and stochastic approaches, and its versatility and effectiveness are highlighted.

Abstract

Bayesian inference offers a robust framework for updating prior beliefs based on new data using Bayes' theorem, but exact inference is often computationally infeasible, necessitating approximate methods. Though widely used, these methods struggle to estimate marginal likelihoods accurately, particularly due to the rigid functional structures of deterministic models like Gaussian processes and the limitations of small sample sizes in stochastic models like the ensemble Kalman method. In this work, we introduce BI-EqNO, an equivariant neural operator framework for generalized approximate Bayesian inference, designed to enhance both deterministic and stochastic approaches. BI-EqNO transforms priors into posteriors conditioned on observation data through data-driven training. The framework is flexible, supporting diverse prior and posterior representations with arbitrary discretizations and varying numbers of observations. Crucially, BI-EqNO's architecture ensures (1) permutation equivariance between prior and posterior representations, and (2) permutation invariance with respect to observational data. We demonstrate BI-EqNO's utility through two examples: (1) as a generalized Gaussian process (gGP) for regression, and (2) as an ensemble neural filter (EnNF) for sequential data assimilation. Results show that gGP outperforms traditional Gaussian processes by offering a more flexible representation of covariance functions. Additionally, EnNF not only outperforms the ensemble Kalman filter in small-ensemble settings but also has the potential to function as a "super" ensemble filter, capable of representing and integrating multiple ensemble filters for enhanced assimilation performance. This study highlights BI-EqNO's versatility and effectiveness, improving Bayesian inference through data-driven training while reducing computational costs across various applications.

Paper Structure

This paper contains 50 sections, 38 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Schematic illustration of the neural operator-based framework for generalized approximate Bayesian inference, showing the transformation from the prior distribution to the posterior distribution through data-driven training, conditioned on the observed data.
  • Figure 2: Schematic architecture of the equivariant neural operator framework for generalized approximate Bayesian inference. The input elements $\{\mathbf{x}_i\}_{i=1}^N$ are embedded through a permutation equivariant neural operator, while the observation data $\{\mathbf{d}_i\}_{i=1}^M$ are embedded through a permutation invariant neural operator. These embeddings are concatenated and passed through another permutation equivariant neural operator to generate the output $\{\mathbf{x}_i^\mathrm{P}\}_{i=1}^N$.
  • Figure 3: Architectures of (a) the permutation invariant neural operator $\mathcal{F}_\mathrm{I}$ and (b) the permutation equivariant neural operator $\mathcal{F}_\mathrm{E}$. In (a), each input element is individually processed through a shared embedding network. The resulting $N$ embeddings are then averaged to remove the influence of ordering, and the aggregated embedding is passed through a fitting network to produce the output. In (b), $\mathcal{F}_\mathrm{E}$ extends $\mathcal{F}_\mathrm{I}$ by concatenating element-wise embeddings to preserve the ordering of the input elements.
  • Figure 4: Architecture of the generalized Gaussian process (gGP) for regression. It defines a permutation equivariant mapping from the test points $X^* = \{\bm{x}_i^*\}_{i=1}^N$ to the mean $\mathsf{m}^\mathrm{NN}(X^*)$ and covariance $\mathsf{K}^\mathrm{NN}(X^*, X^*)$ of their function values, conditioned on the observation data $\mathcal{D} = \{(\bm{x}_i^\text{dat},\tilde{y}_i^\text{dat})\}_{i=1}^{M}$ in a permutation invariant manner. From left to right: The test points $X^*$ are processed through two shared neural networks, $\phi_\text{int}$ for interaction embedding and $\phi_\text{self}$ for self-embedding. Meanwhile, the observation data $\mathcal{D}$ is embedded through a share network $\phi_\mathrm{d}$. The embeddings from both test points and observation data are concatenated and passed through a shared fitting network $\phi_\text{fit}$ to generate the predicted function values $\{\mathsf{m}^\text{NN}(\bm{x}_i^*)\}_{i=1}^N$ and the covariance matrix $\mathsf{K}^\text{NN}(X^*, X^*)$, based on the latent vectors $\{\bm{l}_i\}_{i=1}^N$ and the diagonal elements $\{d_i\}_{i=1}^N$.
  • Figure 5: Architecture of the ensemble neural filter (EnNF) for sequential data assimilation. The EnNF defines a permutation equivariant mapping from prior to posterior ensembles of individual state variables, i.e., $\{z_{i,j}^\mathrm{f}\}_{i=1}^N \mapsto \{z_{i,j}^\mathrm{a}\}_{i=1}^N$, uniquely conditioned on the information within the observation space, $\left\{\left(\mathcal{H} (\mathbf{z}_i^\text{f}), \mathbf{d}_i\right)\right\}_{i=1}^N$. The ensemble statistical properties and the realization-specific properties are embedded through a shared network $\phi_\mathrm{int}$ and another shared network $\phi_\mathrm{self}$, respectively. These embeddings are concatenated and passed through a shared fitting network $\phi_\mathrm{fit}$ to yield the updated state variables.
  • ...and 7 more figures