Table of Contents
Fetching ...

A Bayesian Approach to Low-Discrepancy Subset Selection

Nathan Kirk

TL;DR

It is established, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard and a Bayesian Optimization procedure is proposed utilizing the recent notion of deep embedding kernels.

Abstract

Low-discrepancy designs play a central role in quasi-Monte Carlo methods and are increasingly influential in other domains such as machine learning, robotics and computer graphics, to name a few. In recent years, one such low-discrepancy construction method called subset selection has received a lot of attention. Given a large population, one optimally selects a small low-discrepancy subset with respect to a discrepancy-based objective. Versions of this problem are known to be NP-hard. In this text, we establish, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard. Motivated by this intractability, we propose a Bayesian Optimization procedure for the subset selection problem utilizing the recent notion of deep embedding kernels. We demonstrate the performance of the BO algorithm to minimize discrepancy measures and note that the framework is broadly applicable any design criteria.

A Bayesian Approach to Low-Discrepancy Subset Selection

TL;DR

It is established, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard and a Bayesian Optimization procedure is proposed utilizing the recent notion of deep embedding kernels.

Abstract

Low-discrepancy designs play a central role in quasi-Monte Carlo methods and are increasingly influential in other domains such as machine learning, robotics and computer graphics, to name a few. In recent years, one such low-discrepancy construction method called subset selection has received a lot of attention. Given a large population, one optimally selects a small low-discrepancy subset with respect to a discrepancy-based objective. Versions of this problem are known to be NP-hard. In this text, we establish, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard. Motivated by this intractability, we propose a Bayesian Optimization procedure for the subset selection problem utilizing the recent notion of deep embedding kernels. We demonstrate the performance of the BO algorithm to minimize discrepancy measures and note that the framework is broadly applicable any design criteria.
Paper Structure (11 sections, 1 theorem, 31 equations, 3 figures, 1 algorithm)

This paper contains 11 sections, 1 theorem, 31 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

For the inputs $P_N=\{\mathbf{X}_i\}_{i=1}^N\subset[0,1]^d$, an integer $m\le N$, a threshold $\tau>0$ and a positive definite kernel $k:[0,1]^d \times [0,1]^d \rightarrow \mathbb{R}$, the decision problem is NP--hard.

Figures (3)

  • Figure 1: Symmetric discrepancy minimization for $N=1000$ and $m=25$.
  • Figure 2: Maximum mean discrepancy minimization for two-component Gaussian mixture for random, GLS, BO-DS and BO-DE methods for $N=1000$ and $m=25$. The resulting subset for GLS (Middle) and BO-DE (Right).
  • Figure 3: $L_\infty$ star discrepancy minimization for $N=1000$ and $m=25$.

Theorems & Definitions (4)

  • Definition 1: $L_\infty$ Star Discrepancy
  • Definition 2: Maximum Mean Discrepancy
  • Theorem 1
  • proof