Table of Contents
Fetching ...

Local Entropy Search over Descent Sequences for Bayesian Optimization

David Stenger, Armin Lindicke, Alexander von Rohr, Sebastian Trimpe

TL;DR

Local Entropy Search (LES) reframes Bayesian optimization to target the local optimum reachable from a given initial design by propagating a Gaussian Process surrogate through the optimizer to obtain a distribution over descent sequences. The LES acquisition selects the next query by maximizing mutual information with respect to these descent sequences, combining analytic entropy with Monte-Carlo sampling of optimizer trajectories. Empirically, LES achieves superior sample efficiency compared with both local and global BO baselines, especially in high-dimensional and highly complex tasks, and a probabilistic stopping rule provides a local-regret guarantee. The work highlights LES as a principled, information-theoretic approach for efficient local optimization in expensive black-box settings and outlines avenues for extending LES to constrained, multi-fidelity, or batch scenarios.

Abstract

Searching large and complex design spaces for a global optimum can be infeasible and unnecessary. A practical alternative is to iteratively refine the neighborhood of an initial design using local optimization methods such as gradient descent. We propose local entropy search (LES), a Bayesian optimization paradigm that explicitly targets the solutions reachable by the descent sequences of iterative optimizers. The algorithm propagates the posterior belief over the objective through the optimizer, resulting in a probability distribution over descent sequences. It then selects the next evaluation by maximizing mutual information with that distribution, using a combination of analytic entropy calculations and Monte-Carlo sampling of descent sequences. Empirical results on high-complexity synthetic objectives and benchmark problems show that LES achieves strong sample efficiency compared to existing local and global Bayesian optimization methods.

Local Entropy Search over Descent Sequences for Bayesian Optimization

TL;DR

Local Entropy Search (LES) reframes Bayesian optimization to target the local optimum reachable from a given initial design by propagating a Gaussian Process surrogate through the optimizer to obtain a distribution over descent sequences. The LES acquisition selects the next query by maximizing mutual information with respect to these descent sequences, combining analytic entropy with Monte-Carlo sampling of optimizer trajectories. Empirically, LES achieves superior sample efficiency compared with both local and global BO baselines, especially in high-dimensional and highly complex tasks, and a probabilistic stopping rule provides a local-regret guarantee. The work highlights LES as a principled, information-theoretic approach for efficient local optimization in expensive black-box settings and outlines avenues for extending LES to constrained, multi-fidelity, or batch scenarios.

Abstract

Searching large and complex design spaces for a global optimum can be infeasible and unnecessary. A practical alternative is to iteratively refine the neighborhood of an initial design using local optimization methods such as gradient descent. We propose local entropy search (LES), a Bayesian optimization paradigm that explicitly targets the solutions reachable by the descent sequences of iterative optimizers. The algorithm propagates the posterior belief over the objective through the optimizer, resulting in a probability distribution over descent sequences. It then selects the next evaluation by maximizing mutual information with that distribution, using a combination of analytic entropy calculations and Monte-Carlo sampling of descent sequences. Empirical results on high-complexity synthetic objectives and benchmark problems show that LES achieves strong sample efficiency compared to existing local and global Bayesian optimization methods.

Paper Structure

This paper contains 59 sections, 4 theorems, 44 equations, 30 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Assume ass:wilson. Given a risk tolerance $\delta>0$, define non-zero probabilities $\delta_{\mathrm{mod}}$ and $\delta_{\mathrm{est}}$ such that $\delta_{\mathrm{mod}}+\delta_{\mathrm{est}}\le \delta$ and let $\bigl(\delta^{t}_{\mathrm{test}}\bigr)_{t\ge 0}$ be a positive sequence so that $\sum_{t=

Figures (30)

  • Figure 1: Distribution over Descent Sequences:Left: Local optimization on the unknown objective function. Middle: The prior over the objective function induces a belief over descent sequences. Right: After sampling data points the distributions over descent sequences and local optimum approach the deterministic ones. In LES the next query minimizes the entropy of the descent sequences.
  • Figure 2: Illustration of LES on a 2D example:a) After three initial evaluations, the distribution over reachable local optima is wide. b, c) As LES selects new points, evaluations concentrate near the descent sequence, and the distribution of the local optimum narrows. d) Convergence behavior of LES. After 14 evaluations, the convergence criterion (see Appx. \ref{['sec:stopping']}) stops the optimization.
  • Figure 3: Illustration of the LES acquisition function after three evaluations:Left: The GP posterior mean after conditioning on the observations. Second: The predictive entropy is high in regions with large posterior variance. Third to fifth: The information gain between sampled descent sequences and query locations in $\mathcal{X}$ (the LES acquisition function) is high at points that are far from existing observations and aligned with likely descent sequences.
  • Figure 4: Optimizing Gaussian Process Samples: Median, 25-, and 75-percent quantiles for the best function values found for 20 sampled objective functions with medium complexity (see Tab. \ref{['tab:ranks_results_table']}). LES outperforms baselines as dimensionality increases.
  • Figure 5: Synthetic and Application-Oriented Objective Functions: Median, 25-, and 75-percent quantiles for the best (normalized) function values found. Additional results in Appx. \ref{['apdx:additionalres']}.
  • ...and 25 more figures

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Definition 1: Local simple regret
  • Definition 2: ($\varepsilon,\delta$)-local optimality
  • Theorem 1: Proposition 2, wilson2024stopping
  • Lemma 1: Density of LES maximizers
  • proof
  • Lemma 2: Open--set reachability of GD paths
  • proof
  • Corollary 1: Full support of finite GD paths
  • ...and 1 more