Table of Contents
Fetching ...

Robust Online Sampling from Possibly Moving Target Distributions

François Clément, Stefan Steinerberger

TL;DR

This work introduces a simple, robust greedy scheme for online sampling that adds points to an existing 1D point set to uniformly approximate a target measure $\mu$, or its CDF-transformed version, with a key update $x_{n+1}=k/(n+1)$. By analyzing a continuous mean-field limit, the authors derive a functional $E(\mu)=\int_0^1 |\Phi^{-1}(t)-t|dt$ whose minimizers are fixed points of $\Phi$, and show the discrete process inherits a structure that avoids spurious local minima and yields fast $O(n)$ computation per step. They connect the discrete energy to the Wasserstein-1 distance and Halász discrepancy bounds, provide explicit proofs and constructions (including a counterexample demonstrating non-monotone energy steps), and discuss related periodic-energy sequences with strong numerical performance. The approach extends naturally to general distributions via the CDF, with practical demonstrations illustrating uniformity improvements and competitive performance against classical quasi-random sequences. The results have implications for sequential sampling, Bayesian settings with unknown budgets, and discrepancy-optimal point distributions in one dimension.

Abstract

We suppose we are given a list of points $x_1, \dots, x_n \in \mathbb{R}$, a target probability measure $μ$ and are asked to add additional points $x_{n+1}, \dots, x_{n+m}$ so that $x_1, \dots, x_{n+m}$ is as close as possible to the distribution of $μ$; additionally, we want this to be true uniformly for all $m$. We propose a simple method that achieves this goal. It selects new points in regions where the existing set is lacking points and avoids regions that are already overly crowded. If we replace $μ$ by another measure $μ_2$ in the middle of the computation, the method dynamically adjusts and allows us to keep the original sampling points. $x_{n+1}$ can be computed in $\mathcal{O}(n)$ steps and we obtain state-of-the-art results. It appears to be an interesting dynamical system in its own right; we analyze a continuous mean-field version that reflects much of the same behavior.

Robust Online Sampling from Possibly Moving Target Distributions

TL;DR

This work introduces a simple, robust greedy scheme for online sampling that adds points to an existing 1D point set to uniformly approximate a target measure , or its CDF-transformed version, with a key update . By analyzing a continuous mean-field limit, the authors derive a functional whose minimizers are fixed points of , and show the discrete process inherits a structure that avoids spurious local minima and yields fast computation per step. They connect the discrete energy to the Wasserstein-1 distance and Halász discrepancy bounds, provide explicit proofs and constructions (including a counterexample demonstrating non-monotone energy steps), and discuss related periodic-energy sequences with strong numerical performance. The approach extends naturally to general distributions via the CDF, with practical demonstrations illustrating uniformity improvements and competitive performance against classical quasi-random sequences. The results have implications for sequential sampling, Bayesian settings with unknown budgets, and discrepancy-optimal point distributions in one dimension.

Abstract

We suppose we are given a list of points , a target probability measure and are asked to add additional points so that is as close as possible to the distribution of ; additionally, we want this to be true uniformly for all . We propose a simple method that achieves this goal. It selects new points in regions where the existing set is lacking points and avoids regions that are already overly crowded. If we replace by another measure in the middle of the computation, the method dynamically adjusts and allows us to keep the original sampling points. can be computed in steps and we obtain state-of-the-art results. It appears to be an interesting dynamical system in its own right; we analyze a continuous mean-field version that reflects much of the same behavior.

Paper Structure

This paper contains 24 sections, 10 theorems, 128 equations, 12 figures, 1 algorithm.

Key Result

Theorem 1

Given $0 \leq x_1 \leq x_2 \leq \dots \leq x_n \leq 1$, then can be computed using $\mathcal{O}(n)$ operations. Moreover, $x_{n+1} = k/(n+1)$ is a rational number with $1 \leq k \leq n+1$. If $1 \leq k \leq n-1$, then $x_k < (k+1)/(n+1) < x_{k+1}$.

Figures (12)

  • Figure 1: 500 uniformly distributed random variables followed by 500 new points: they initially fix gaps before, around point $\sim 700$, reaching an equilibrium and refining at smaller scales.
  • Figure 2: 50 iid random points in $[0,1]$ followed by 60 new points. The new points fill the gaps, the CDF converges quickly to $x$.
  • Figure 3: Using the method to extend a set of points on $\mathbb{R}$ with respect to an arbitrary probability measure $\mu$.
  • Figure 4: (Left): the algorithm working to fix things in $\operatorname{CDF}-$space. (Right, top): the new points being added in real space. (Right, bottom): the final set of 400 points.
  • Figure 5: Test your intuition: four examples of 1000 random points, their discrepancy function $\Delta(x)$ (black) and $x_{1001}$ (red).
  • ...and 7 more figures

Theorems & Definitions (17)

  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • Theorem 3
  • proof
  • proof
  • proof
  • proof
  • Lemma 1
  • ...and 7 more