Table of Contents
Fetching ...

This Too Shall Pass: Removing Stale Observations in Dynamic Bayesian Optimization

Anthony Bardou, Patrick Thiran, Giovanni Ranieri

TL;DR

This paper designs a Wasserstein distance-based criterion able to quantify the relevancy of an observation with respect to future predictions and uses this criterion to build W-DBO, a DBO algorithm able to remove irrelevant observations from its dataset on the fly, thus maintaining simultaneously a good predictive performance and a high sampling frequency, even in continuous-time optimization tasks with unknown horizon.

Abstract

Bayesian Optimization (BO) has proven to be very successful at optimizing a static, noisy, costly-to-evaluate black-box function $f : \mathcal{S} \to \mathbb{R}$. However, optimizing a black-box which is also a function of time (i.e., a dynamic function) $f : \mathcal{S} \times \mathcal{T} \to \mathbb{R}$ remains a challenge, since a dynamic Bayesian Optimization (DBO) algorithm has to keep track of the optimum over time. This changes the nature of the optimization problem in at least three aspects: (i) querying an arbitrary point in $\mathcal{S} \times \mathcal{T}$ is impossible, (ii) past observations become less and less relevant for keeping track of the optimum as time goes by and (iii) the DBO algorithm must have a high sampling frequency so it can collect enough relevant observations to keep track of the optimum through time. In this paper, we design a Wasserstein distance-based criterion able to quantify the relevancy of an observation with respect to future predictions. Then, we leverage this criterion to build W-DBO, a DBO algorithm able to remove irrelevant observations from its dataset on the fly, thus maintaining simultaneously a good predictive performance and a high sampling frequency, even in continuous-time optimization tasks with unknown horizon. Numerical experiments establish the superiority of W-DBO, which outperforms state-of-the-art methods by a comfortable margin.

This Too Shall Pass: Removing Stale Observations in Dynamic Bayesian Optimization

TL;DR

This paper designs a Wasserstein distance-based criterion able to quantify the relevancy of an observation with respect to future predictions and uses this criterion to build W-DBO, a DBO algorithm able to remove irrelevant observations from its dataset on the fly, thus maintaining simultaneously a good predictive performance and a high sampling frequency, even in continuous-time optimization tasks with unknown horizon.

Abstract

Bayesian Optimization (BO) has proven to be very successful at optimizing a static, noisy, costly-to-evaluate black-box function . However, optimizing a black-box which is also a function of time (i.e., a dynamic function) remains a challenge, since a dynamic Bayesian Optimization (DBO) algorithm has to keep track of the optimum over time. This changes the nature of the optimization problem in at least three aspects: (i) querying an arbitrary point in is impossible, (ii) past observations become less and less relevant for keeping track of the optimum as time goes by and (iii) the DBO algorithm must have a high sampling frequency so it can collect enough relevant observations to keep track of the optimum through time. In this paper, we design a Wasserstein distance-based criterion able to quantify the relevancy of an observation with respect to future predictions. Then, we leverage this criterion to build W-DBO, a DBO algorithm able to remove irrelevant observations from its dataset on the fly, thus maintaining simultaneously a good predictive performance and a high sampling frequency, even in continuous-time optimization tasks with unknown horizon. Numerical experiments establish the superiority of W-DBO, which outperforms state-of-the-art methods by a comfortable margin.
Paper Structure (49 sections, 15 theorems, 98 equations, 22 figures, 5 tables, 1 algorithm)

This paper contains 49 sections, 15 theorems, 98 equations, 22 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $t_0$ be the present time and $\mathcal{D} = \left\{((\bm x_i, t_i), y_i)\right\}_{i \in \llbracket1, n\rrbracket}$ be a dataset of observations made before $t_0$. Let $\Tilde{\mathcal{D}} = \left\{((\bm x_i, t_i), y_i)\right\}_{i \in \llbracket2, n\rrbracket}$ be the dataset without the first o where $C(\mathcal{X}, \mathcal{Y}) = \left((k_S * k_S)(\bm x_j - \bm x_i) \cdot (k_T * k_T)_{t_0 -

Figures (22)

  • Figure 1: Similar values of Wasserstein distance, different effect on posteriors. For visualization purposes, only the posterior means of two posterior GPs (blue for $\mu_\mathcal{D}$ and orange for $\mu_{\Tilde{\mathcal{D}}}$) are depicted, along a single dimension (e.g., time). The Wasserstein distance between the two posteriors is shown by the green shaded area. The GPs have a small lengthscale (left) or, conversely, a large lengthscale (right) for the chosen dimension.
  • Figure 2: Normalized Wasserstein distances. Similarly to Figure \ref{['fig:wasserstein_absolute']}, a few couples of GP posterior means $(\mu_\mathcal{D}, \mu_{\Tilde{\mathcal{D}}})$ are depicted. The top (resp., bottom) row depicts couples of posteriors that yield a small (resp., large) ratio \ref{['eq:wasserstein-relative']}. The left (resp., right) column depicts couples of posteriors controlled by a small (resp., large) lengthscale. The prior GP mean $\mu_\emptyset = 0$ is shown as a black dashed line, and the Wasserstein distance between the posterior and the prior as a gray shaded area.
  • Figure 3: (Left) Sensitivity analysis on the Eggholder function. (Right) Aggregation of sensitivity analyses of W-DBO made on 10 synthetic functions and a real-world experiment. For aggregation purposes, the average regrets in each experiment have been normalized between 0 (lowest average regret) and 1 (largest average regret). The average performance of W-DBO over all the experiments is shown in black. Standard errors are depicted with colored bars (left) and shaded areas (right).
  • Figure 4: (Left) Average regrets of the DBO solutions during the optimization of the Ackley synthetic function. (Right) Dataset sizes of the DBO solutions during the optimization of the Ackley function.
  • Figure 5: Visual summary of the results reported in Table \ref{['tab:results']}. For aggregation purposes, the average regrets in each experiment have been normalized between 0 (lowest average regret) and 1 (largest average regret). The average performance of the DBO solutions is shown in black.
  • ...and 17 more figures

Theorems & Definitions (28)

  • Theorem 4.1
  • Lemma A.1
  • proof
  • Proposition A.2
  • Proposition A.3
  • proof
  • proof
  • Lemma B.1
  • proof
  • Lemma B.2
  • ...and 18 more