A Bayesian Approach to Low-Discrepancy Subset Selection

Nathan Kirk

A Bayesian Approach to Low-Discrepancy Subset Selection

Nathan Kirk

TL;DR

It is established, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard and a Bayesian Optimization procedure is proposed utilizing the recent notion of deep embedding kernels.

Abstract

Low-discrepancy designs play a central role in quasi-Monte Carlo methods and are increasingly influential in other domains such as machine learning, robotics and computer graphics, to name a few. In recent years, one such low-discrepancy construction method called subset selection has received a lot of attention. Given a large population, one optimally selects a small low-discrepancy subset with respect to a discrepancy-based objective. Versions of this problem are known to be NP-hard. In this text, we establish, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard. Motivated by this intractability, we propose a Bayesian Optimization procedure for the subset selection problem utilizing the recent notion of deep embedding kernels. We demonstrate the performance of the BO algorithm to minimize discrepancy measures and note that the framework is broadly applicable any design criteria.

A Bayesian Approach to Low-Discrepancy Subset Selection

TL;DR

Abstract

Paper Structure (11 sections, 1 theorem, 31 equations, 3 figures, 1 algorithm)

This paper contains 11 sections, 1 theorem, 31 equations, 3 figures, 1 algorithm.

Introduction
The Subset Selection Problem
Our Contribution
Kernel Subset Selection is NP--Hard
Subset Selection via Bayesian Optimization
Deep Embedding Kernels
Bayesian Optimization for Subset Selection
Numerical Experiments
Experimental Setup
Results
Discussion

Key Result

Theorem 1

For the inputs $P_N=\{\mathbf{X}_i\}_{i=1}^N\subset[0,1]^d$, an integer $m\le N$, a threshold $\tau>0$ and a positive definite kernel $k:[0,1]^d \times [0,1]^d \rightarrow \mathbb{R}$, the decision problem is NP--hard.

Figures (3)

Figure 1: Symmetric discrepancy minimization for $N=1000$ and $m=25$.
Figure 2: Maximum mean discrepancy minimization for two-component Gaussian mixture for random, GLS, BO-DS and BO-DE methods for $N=1000$ and $m=25$. The resulting subset for GLS (Middle) and BO-DE (Right).
Figure 3: $L_\infty$ star discrepancy minimization for $N=1000$ and $m=25$.

Theorems & Definitions (4)

Definition 1: $L_\infty$ Star Discrepancy
Definition 2: Maximum Mean Discrepancy
Theorem 1
proof

A Bayesian Approach to Low-Discrepancy Subset Selection

TL;DR

Abstract

A Bayesian Approach to Low-Discrepancy Subset Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)