Table of Contents
Fetching ...

Stein Random Feature Regression

Houston Warren, Rafael Oliveira, Fabio Ramos

TL;DR

Stein random features (SRF) is introduced, leveraging Stein variational gradient descent, which can be used to both generate high-quality RFF samples of known spectral densities as well as flexibly and efficiently approximate traditionally non-analytical spectral measure posteriors.

Abstract

In large-scale regression problems, random Fourier features (RFFs) have significantly enhanced the computational scalability and flexibility of Gaussian processes (GPs) by defining kernels through their spectral density, from which a finite set of Monte Carlo samples can be used to form an approximate low-rank GP. However, the efficacy of RFFs in kernel approximation and Bayesian kernel learning depends on the ability to tractably sample the kernel spectral measure and the quality of the generated samples. We introduce Stein random features (SRF), leveraging Stein variational gradient descent, which can be used to both generate high-quality RFF samples of known spectral densities as well as flexibly and efficiently approximate traditionally non-analytical spectral measure posteriors. SRFs require only the evaluation of log-probability gradients to perform both kernel approximation and Bayesian kernel learning that results in superior performance over traditional approaches. We empirically validate the effectiveness of SRFs by comparing them to baselines on kernel approximation and well-known GP regression problems.

Stein Random Feature Regression

TL;DR

Stein random features (SRF) is introduced, leveraging Stein variational gradient descent, which can be used to both generate high-quality RFF samples of known spectral densities as well as flexibly and efficiently approximate traditionally non-analytical spectral measure posteriors.

Abstract

In large-scale regression problems, random Fourier features (RFFs) have significantly enhanced the computational scalability and flexibility of Gaussian processes (GPs) by defining kernels through their spectral density, from which a finite set of Monte Carlo samples can be used to form an approximate low-rank GP. However, the efficacy of RFFs in kernel approximation and Bayesian kernel learning depends on the ability to tractably sample the kernel spectral measure and the quality of the generated samples. We introduce Stein random features (SRF), leveraging Stein variational gradient descent, which can be used to both generate high-quality RFF samples of known spectral densities as well as flexibly and efficiently approximate traditionally non-analytical spectral measure posteriors. SRFs require only the evaluation of log-probability gradients to perform both kernel approximation and Bayesian kernel learning that results in superior performance over traditional approaches. We empirically validate the effectiveness of SRFs by comparing them to baselines on kernel approximation and well-known GP regression problems.
Paper Structure (39 sections, 2 theorems, 61 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 39 sections, 2 theorems, 61 equations, 11 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

A shift-invariant kernel $k(\mathbf{x}, \mathbf{x}^\prime) = k(\mathbf{x} - \mathbf{x}^\prime)$ is positive-definite if and only if it is the Fourier transform of a non-negative measure.

Figures (11)

  • Figure 1: Comparison of Traditional RFF Kernel Learning to an M-SRFR Posterior With $M = 8$ Components
  • Figure 2: Kernel Approximation Error and Standard Deviations Over 10 Random Seeds.
  • Figure 3: AUSWAVE Dataset and Error with Contributed Methods in Blue.
  • Figure 4: airfoil Learned Kernels by Dimension.
  • Figure 5: Selection of Single SSGP and M-SRFR Predictive Distributions for airfoil Test Points.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Theorem 1: Bochner's theorem rudin_fourier_2011
  • Theorem 2
  • Definition 3: Mixture Stein Random Feature Regression
  • Definition 4: Functional gradient
  • proof : Proof of \ref{['thr:kl-grad']}