On the design of scalable, high-precision spherical-radial Fourier features

Ayoub Belhadji; Qianyu Julie Zhu; Youssef Marzouk

On the design of scalable, high-precision spherical-radial Fourier features

Ayoub Belhadji, Qianyu Julie Zhu, Youssef Marzouk

TL;DR

This paper tackles scalable kernel approximation by designing spherical-radial Fourier features for the squared exponential kernel. It decomposes the Gaussian integral into radial and spherical components and proposes a tensor-product quadrature with Gaussian-Laguerre radial nodes and spherical nodes chosen via Monte Carlo or orthogonal schemes, plus an optional optimal kernel quadrature. The authors derive explicit bounds on the expected squared error that separate radial and spherical contributions and demonstrate that orthogonal spherical designs and kernel-weight optimization yield exponential or near-exponential convergence in practice, especially when feature counts are modest relative to dimension. Numerically, SR-OMC and OKQ-SOMC outperform existing methods across multiple datasets, highlighting the method’s robustness to dimension, kernel bandwidth, and dataset diameter, with significant implications for scalable kernel learning. The work also provides practical guidance on balancing radial and spherical nodes to minimize total error and suggests future extensions to other shift-invariant kernels.

Abstract

Approximation using Fourier features is a popular technique for scaling kernel methods to large-scale problems, with myriad applications in machine learning and statistics. This method replaces the integral representation of a shift-invariant kernel with a sum using a quadrature rule. The design of the latter is meant to reduce the number of features required for high-precision approximation. Specifically, for the squared exponential kernel, one must design a quadrature rule that approximates the Gaussian measure on $\mathbb{R}^d$. Previous efforts in this line of research have faced difficulties in higher dimensions. We introduce a new family of quadrature rules that accurately approximate the Gaussian measure in higher dimensions by exploiting its isotropy. These rules are constructed as a tensor product of a radial quadrature rule and a spherical quadrature rule. Compared to previous work, our approach leverages a thorough analysis of the approximation error, which suggests natural choices for both the radial and spherical components. We demonstrate that this family of Fourier features yields improved approximation bounds.

On the design of scalable, high-precision spherical-radial Fourier features

TL;DR

Abstract

. Previous efforts in this line of research have faced difficulties in higher dimensions. We introduce a new family of quadrature rules that accurately approximate the Gaussian measure in higher dimensions by exploiting its isotropy. These rules are constructed as a tensor product of a radial quadrature rule and a spherical quadrature rule. Compared to previous work, our approach leverages a thorough analysis of the approximation error, which suggests natural choices for both the radial and spherical components. We demonstrate that this family of Fourier features yields improved approximation bounds.

Paper Structure (40 sections, 11 theorems, 167 equations, 9 figures)

This paper contains 40 sections, 11 theorems, 167 equations, 9 figures.

Introduction
Notation
Main results
A family of spherical-radial quadrature rules
On the design of the spherical-radial quadrature rule
The radial quadrature rule
The spherical quadrature rule
Monte Carlo on $\mathbb{S}^{d-1}$
The orthogonal Monte Carlo quadrature on $\mathbb{S}^{d-1}$
The optimal kernel quadrature
Related work
Vanilla random Fourier features (RFF)
QMC Fourier features
Stochastic spherical rules and orthogonal Fourier features
Numerical simulations
...and 25 more sections

Key Result

Proposition 1

For $x,y \in \mathbb{R}^{d}$, we have where

Figures (9)

Figure 1: Monte Carlo on $\mathbb{S}^{d-1}$ versus Orthogonal Monte Carlo on $\mathbb{S}^{d-1}$. Dimensions from left to right: 2, 4, 8, 16, 32.
Figure 2: Relative error of different kernel approximation schemes for the dataset Powerplant. Shaded regions indicate sample standard deviation of the relative error, computed over 20 independent runs of each method. Radial nodes $M_R$ are fixed and the number of spherical nodes $M_S$ changes.
Figure 3: Relative error of kernel approximation schemes for increasing number of radial nodes $M_R$.
Figure 4: Kernel approximation error on 4 datasets. SSR has slightly different bins on the x-axis due to its specific spherical-radial construction.
Figure 5: Kernel approximation error on 4 datasets. SSR has slightly different bins on the x-axis due to its specific spherical-radial construction.
...and 4 more figures

Theorems & Definitions (16)

Definition 1
Proposition 1
Proposition 2
Proposition 3
Proposition 4
Theorem 2.1
Proposition 5
Theorem 2.2
Lemma C.1
proof
...and 6 more

On the design of scalable, high-precision spherical-radial Fourier features

TL;DR

Abstract

On the design of scalable, high-precision spherical-radial Fourier features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (16)