Table of Contents
Fetching ...

Super-Samples from Kernel Herding

Yutian Chen, Max Welling, Alex Smola

TL;DR

The herding algorithm is extended to continuous spaces by using the kernel trick and it is shown that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O( 1/pT) for iid random samples.

Abstract

We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting "kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O(1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.

Super-Samples from Kernel Herding

TL;DR

The herding algorithm is extended to continuous spaces by using the kernel trick and it is shown that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O( 1/pT) for iid random samples.

Abstract

We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting "kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O(1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.

Paper Structure

This paper contains 10 sections, 19 equations, 6 figures.

Figures (6)

  • Figure 1: First 20 samples form herding (red squares) versus i.i.d. random sampling (purple circles).
  • Figure 2: Error in estimating the expectation of four functions, by herding (blue) and random sampling (green) as a function of the number of samples. The decreasing speed of the upper bound of the error is shown on top of each figure.
  • Figure 3: Error in estimating the expectation of four functions by herding on the true distribution$p$ (red) and the empirical distribution $\mathfrak{D}$ (blue) as a function of the number of samples. The convergence rate of the error on $\mathfrak{D}$ (measured as slope of the upper bound of the herding error) is shown on top of each figure. The error of random sampling on $p$ (green) is also plotted for comparison.
  • Figure 4: Linear relationship between$1 / \mathcal{E}_{T}$ and $T$
  • Figure 5: RMSE of the predicted probability of herding (blue) and a random subset (blue) w.r.t. the whole sample set.
  • ...and 1 more figures