Table of Contents
Fetching ...

Optimizing Kernel Discrepancies via Subset Selection

Deyao Chen, François Clément, Carola Doerr, Nathan Kirk

TL;DR

This work extends subset selection from traditional $L_\infty$ discrepancy to kernel discrepancies, enabling efficient extraction of an $m$-point subset from a large pool of size $n$ for both uniform and nonuniform targets via $d^{\mathcal{H}}_{2,F}(P)$ and Kernel Stein Discrepancy (KSD). It introduces a swap-based, fast-update heuristic, leveraging a decomposition $d^{\mathcal{H}}_{2,F}(P)^2 = c + \sum_{i,j} V(i,j)$ and an auxiliary $B$-array to compute gains efficiently, with practical restart and initialization schemes to navigate local optima. The authors demonstrate, across $L_2$ star discrepancy and KSD objectives, that their kernel-based subset selection often matches or surpasses existing $L_\infty$-focused methods in higher dimensions and yields substantial improvements over Stein Points for nonuniform targets. The approach enables scalable, non-gradient optimization of discrepancy-based objectives and suggests practical applications such as MCMC thinning, while highlighting paths for exact formulations and addressing known KSD pathologies. Overall, the work advances principled, kernel-based design of low-discrepancy samples suitable for both classical QMC and density-targeted sampling tasks with broad practical impact.

Abstract

Kernel discrepancies are a powerful tool for analyzing worst-case errors in quasi-Monte Carlo (QMC) methods. Building on recent advances in optimizing such discrepancy measures, we extend the subset selection problem to the setting of kernel discrepancies, selecting an m-element subset from a large population of size $n \gg m$. We introduce a novel subset selection algorithm applicable to general kernel discrepancies to efficiently generate low-discrepancy samples from both the uniform distribution on the unit hypercube, the traditional setting of classical QMC, and from more general distributions $F$ with known density functions by employing the kernel Stein discrepancy. We also explore the relationship between the classical $L_2$ star discrepancy and its $L_\infty$ counterpart.

Optimizing Kernel Discrepancies via Subset Selection

TL;DR

This work extends subset selection from traditional discrepancy to kernel discrepancies, enabling efficient extraction of an -point subset from a large pool of size for both uniform and nonuniform targets via and Kernel Stein Discrepancy (KSD). It introduces a swap-based, fast-update heuristic, leveraging a decomposition and an auxiliary -array to compute gains efficiently, with practical restart and initialization schemes to navigate local optima. The authors demonstrate, across star discrepancy and KSD objectives, that their kernel-based subset selection often matches or surpasses existing -focused methods in higher dimensions and yields substantial improvements over Stein Points for nonuniform targets. The approach enables scalable, non-gradient optimization of discrepancy-based objectives and suggests practical applications such as MCMC thinning, while highlighting paths for exact formulations and addressing known KSD pathologies. Overall, the work advances principled, kernel-based design of low-discrepancy samples suitable for both classical QMC and density-targeted sampling tasks with broad practical impact.

Abstract

Kernel discrepancies are a powerful tool for analyzing worst-case errors in quasi-Monte Carlo (QMC) methods. Building on recent advances in optimizing such discrepancy measures, we extend the subset selection problem to the setting of kernel discrepancies, selecting an m-element subset from a large population of size . We introduce a novel subset selection algorithm applicable to general kernel discrepancies to efficiently generate low-discrepancy samples from both the uniform distribution on the unit hypercube, the traditional setting of classical QMC, and from more general distributions with known density functions by employing the kernel Stein discrepancy. We also explore the relationship between the classical star discrepancy and its counterpart.

Paper Structure

This paper contains 30 sections, 21 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: $L_{\infty}$ star discrepancy comparison in two dimensions of the different construction and optimization methods.
  • Figure 2: Subset Selected point sets with respect to KSD for $n=60$. (Left) Mixture of two Gaussians (Right) Product of independent Beta distributions.

Theorems & Definitions (1)

  • Conjecture 5.1