Table of Contents
Fetching ...

Curvature-aware Expected Free Energy as an Acquisition Function for Bayesian Optimization

Ajith Anil Meera, Wouter Kouw

Abstract

We propose an Expected Free Energy-based acquisition function for Bayesian optimization to solve the joint learning and optimization problem, i.e., optimize and learn the underlying function simultaneously. We show that, under specific assumptions, Expected Free Energy reduces to Upper Confidence Bound, Lower Confidence Bound, and Expected Information Gain. We prove that Expected Free Energy has unbiased convergence guarantees for concave functions. Using the results from these derivations, we introduce a curvature-aware update law for Expected Free Energy and show its proof of concept using a system identification problem on a Van der Pol oscillator. Through rigorous simulation experiments, we show that our adaptive Expected Free Energy-based acquisition function outperforms state-of-the-art acquisition functions with the least final simple regret and error in learning the Gaussian process.

Curvature-aware Expected Free Energy as an Acquisition Function for Bayesian Optimization

Abstract

We propose an Expected Free Energy-based acquisition function for Bayesian optimization to solve the joint learning and optimization problem, i.e., optimize and learn the underlying function simultaneously. We show that, under specific assumptions, Expected Free Energy reduces to Upper Confidence Bound, Lower Confidence Bound, and Expected Information Gain. We prove that Expected Free Energy has unbiased convergence guarantees for concave functions. Using the results from these derivations, we introduce a curvature-aware update law for Expected Free Energy and show its proof of concept using a system identification problem on a Van der Pol oscillator. Through rigorous simulation experiments, we show that our adaptive Expected Free Energy-based acquisition function outperforms state-of-the-art acquisition functions with the least final simple regret and error in learning the Gaussian process.

Paper Structure

This paper contains 16 sections, 4 theorems, 52 equations, 2 figures, 2 tables.

Key Result

Theorem III.1

Consider without epistemic value. For a reference point $(\mu_0,\sigma_0)$ with $\sigma_0>0$, the first–order Taylor linearization around $(\mu_0,\sigma_0)$ yields a local linear acquisition $G(x)\approx a\,\mu(x)+b\,\sigma(x)$ with coefficients $a,b$ determined at $(\mu_0,\sigma_0)$. If $y^\ast\gg \mu_0$ (s i.e., the LCB acquisition.

Figures (2)

  • Figure 1: While both methods find the correct parameter $\kappa=3$ (repeated samples around the maximum), adaptive EFE (top) results in a better joint optimization and learning by exploring all high curvature regions. The non adaptive EFE (bottom) on the other hand, neglects high curvature regions.
  • Figure 2: The performance of acquisition functions on the joint optimization and learning problem on a GP for 50 randomly selected functions. EFE occupies the bottom left portion of the graph, indicating its superior performance in jointly doing optimization and learning.

Theorems & Definitions (8)

  • Theorem III.1: Derivation of LCB from EFE
  • proof
  • Theorem III.2: UCB as a Local Linearization of EFE
  • proof
  • Theorem III.3: EFE's Epistemic Term Equals EIG
  • proof
  • Theorem IV.1: Sufficient Condition for Unbiased Local Convergence of EFE
  • proof