Table of Contents
Fetching ...

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

Leonardo Ferreira Guilhoto, Paris Perdikaris

TL;DR

Neural Epistemic Operator Networks is introduced, an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance.

Abstract

Operator learning is a rising field of scientific computing where inputs or outputs of a machine learning model are functions defined in infinite-dimensional spaces. In this paper, we introduce NEON (Neural Epistemic Operator Networks), an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. We showcase the utility of this method for sequential decision-making by examining the problem of composite Bayesian Optimization (BO), where we aim to optimize a function $f=g\circ h$, where $h:X\to C(\mathcal{Y},\mathbb{R}^{d_s})$ is an unknown map which outputs elements of a function space, and $g: C(\mathcal{Y},\mathbb{R}^{d_s})\to \mathbb{R}$ is a known and cheap-to-compute functional. By comparing our approach to other state-of-the-art methods on toy and real world scenarios, we demonstrate that NEON achieves state-of-the-art performance while requiring orders of magnitude less trainable parameters.

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

TL;DR

Neural Epistemic Operator Networks is introduced, an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance.

Abstract

Operator learning is a rising field of scientific computing where inputs or outputs of a machine learning model are functions defined in infinite-dimensional spaces. In this paper, we introduce NEON (Neural Epistemic Operator Networks), an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. We showcase the utility of this method for sequential decision-making by examining the problem of composite Bayesian Optimization (BO), where we aim to optimize a function , where is an unknown map which outputs elements of a function space, and is a known and cheap-to-compute functional. By comparing our approach to other state-of-the-art methods on toy and real world scenarios, we demonstrate that NEON achieves state-of-the-art performance while requiring orders of magnitude less trainable parameters.
Paper Structure (24 sections, 1 theorem, 19 equations, 10 figures, 3 tables)

This paper contains 24 sections, 1 theorem, 19 equations, 10 figures, 3 tables.

Key Result

Theorem 1

Let $\epsilon>0$ and $f:X\to\mathbb{R}$ be a bounded function. Then there exists a choice of $\delta>0$ such that for any surrogate model $G_{\theta}$ we have that $|\alpha'^{\delta}_\text{L-EI}(x,z) - \alpha'_\text{EI}(x,z)|<\epsilon$. This implies that $|\alpha^{\delta}_\text{L-EI}(x) - \alpha_\te

Figures (10)

  • Figure 1: Example of $h(u)\in C([0,221]^2,\mathbb{R}^2)$ for the the Cell Towers problem. The input $u\in\mathbb{R}^{30}$ encodes transmission parameters of 15 cell towers, which are used to produce the function seen above, where signal intensity and interference are plotted, respectively. This information is the used to compute a score $f(u)=g(h(u))\in\mathbb{R}$ which evaluates the quality of cellular service in the region. By using operator composite BO, we take advantage of the known compositional structure of $f=g\circ h$, and only need to model the behaviour of $h$.
  • Figure 2: Diagrams for the architectures used in this paper. The NEON architecture (top) combines the deterministic output of the base network with the stochastic output of the small EpiNet in order to produce predictions $f_\theta(u,y,z)$. The base network (bottom left) uses an encoder/decoder structure, with the encoder being dependent only on $u$, while the decoder receives as input the latent representation $\beta(u)$ and a Fourier feature encoding of the query point $y$. Finally, the EpiNet (bottom right) receives as input features $\Tilde{x}=\texttt{sg}[\phi_\xi(u,y)]$ from the base network along with a random epistemic index $z\sim P_Z$. Here, sg denotes the "stop gradient" operation. The EpiNet is composed of a learnable component $\sigma^L_\eta$ that changes through training, and a prior network $\sigma^P$ that is not affected by the data.
  • Figure 3: Diagrams representing the two decoders used in the experiments considered in this paper. On the left, the Concat Decoder concatenates $\beta$ and $y$, feeding this larger vector into an MLP. On the right, the Split Decoder breaks $\beta$ into smaller components $\beta^1,\dots, \beta^N$ which are progressively concatenated and fed into intermediate layers of an MLP.
  • Figure 4: Experimental results for the Environment Model (left) and Brusselator PDE (right) problems. In both cases we plot the acquisition function that resulted in the lowest final mean objective across 10 different seeds for NEON, RPNBHOURI2023116428 and GPmaddox2021bayesian approaches. Remaining comparisons across different methods can be seen in the appendix. As can be seen, our approach performs significantly better on the environmental model problem, and comparably well on the Brusselator PDE. Following BHOURI2023116428, the uncertainty bands indicate 20% of the standard deviation band.
  • Figure 5: Experimental results for the Optical Interferometer (left) and Cell Towers (right) problems. For the interferometer case, we plot the acquisition function that resulted in the lowest final mean objective across 5 different seeds for NEON, RPNBHOURI2023116428 and GPmaddox2021bayesian approaches. For the cell towers case, we plot comparisons between the L-EI acquisition function and LCB using $\beta=0.1$. Remaining comparisons across different methods can be seen in the appendix. Our approach performs significantly better on the optical interferometer problem, and indicate good behaviour on the cell towers problem. Following BHOURI2023116428, the uncertainty bands indicate 20% of the standard deviation band.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1