Table of Contents
Fetching ...

Lengthscale-informed sparse grids for kernel methods in high dimensions

Elliot J. Addy, Jonas Latz, Aretha L. Teckentrup

TL;DR

The work introduces lengthscale-informed sparse grids (LISGs) to overcome the curse of dimensionality in kernel interpolation and Gaussian process emulation by embedding axis-wise lengthscale anisotropy into the sparse-grid design. A novel LISG construction and its associated operator $P_{L,\boldsymbol{\nu},\mathbf{p}}$ yield dimension-robust error bounds in the native space of separable Matérn kernels, with explicit counts of evaluation points $N_{d,\mathbf{p}}(L)$. The analysis derives predictive variance bounds for GP emulation and provides a fast, scalable implementation based on the sparse-grid combination technique, enabling experiments up to $d=100$. Numerical results show superior accuracy-efficiency of LISGs compared with isotropic sparse grids and Monte Carlo sampling in highly anisotropic settings, without requiring anisotropic regularity of the target function. The framework handles high-dimensional problems by exploiting lengthscale anisotropy through a penalty vector, offering dimension-robust performance and practical applicability in GP surrogate modelling.

Abstract

Kernel interpolation, especially in the context of Gaussian process emulation, is a widely used technique in surrogate modelling, where the goal is to cheaply approximate an input-output map using a limited number of function evaluations. However, in high-dimensional settings, such methods typically suffer from the curse of dimensionality; the number of required evaluations to achieve a fixed approximation error grows exponentially with the input dimension. To overcome this, a common technique used in high-dimensional approximation methods, such as quasi-Monte Carlo and sparse grids, is to exploit functional anisotropy: the idea that some input dimensions are more 'sensitive' than others. In doing so, such methods can significantly reduce the dimension dependence in the error. In this work, we propose a generalisation of sparse grid methods that incorporates a form of anisotropy encoded by the lengthscale parameter in Matérn kernels. We derive error bounds and perform numerical experiments that show that our approach enables effective emulation over arbitrarily high dimensions for functions exhibiting sufficient anisotropy.

Lengthscale-informed sparse grids for kernel methods in high dimensions

TL;DR

The work introduces lengthscale-informed sparse grids (LISGs) to overcome the curse of dimensionality in kernel interpolation and Gaussian process emulation by embedding axis-wise lengthscale anisotropy into the sparse-grid design. A novel LISG construction and its associated operator yield dimension-robust error bounds in the native space of separable Matérn kernels, with explicit counts of evaluation points . The analysis derives predictive variance bounds for GP emulation and provides a fast, scalable implementation based on the sparse-grid combination technique, enabling experiments up to . Numerical results show superior accuracy-efficiency of LISGs compared with isotropic sparse grids and Monte Carlo sampling in highly anisotropic settings, without requiring anisotropic regularity of the target function. The framework handles high-dimensional problems by exploiting lengthscale anisotropy through a penalty vector, offering dimension-robust performance and practical applicability in GP surrogate modelling.

Abstract

Kernel interpolation, especially in the context of Gaussian process emulation, is a widely used technique in surrogate modelling, where the goal is to cheaply approximate an input-output map using a limited number of function evaluations. However, in high-dimensional settings, such methods typically suffer from the curse of dimensionality; the number of required evaluations to achieve a fixed approximation error grows exponentially with the input dimension. To overcome this, a common technique used in high-dimensional approximation methods, such as quasi-Monte Carlo and sparse grids, is to exploit functional anisotropy: the idea that some input dimensions are more 'sensitive' than others. In doing so, such methods can significantly reduce the dimension dependence in the error. In this work, we propose a generalisation of sparse grid methods that incorporates a form of anisotropy encoded by the lengthscale parameter in Matérn kernels. We derive error bounds and perform numerical experiments that show that our approach enables effective emulation over arbitrarily high dimensions for functions exhibiting sufficient anisotropy.

Paper Structure

This paper contains 16 sections, 32 theorems, 115 equations, 9 figures, 1 algorithm.

Key Result

Proposition 1

For all $\mathbf{x}\in\Omega$, $s_{\mathcal{X},\varphi}(f)(\mathbf{x}) = m_{\mathcal{X}}^f(\mathbf{x})$.

Figures (9)

  • Figure 1: Illustrated cross-sections of an anisotropic function satisfying \ref{['eq: anisotropic condition']} in which dimensions are ordered such that the lengthscales grow, $\lambda_j\leq\lambda_{j+1}$. We observe that the function exhibits much more variation in its first parameter, $x_1$, than its last, $x_d$.
  • Figure 2: The nested point-sets $\mathcal{X}_l$, (a), and $\mathcal{X}_l^p$ with penalty $p=2$, (b), for different levels, $l\in\mathbb{N}_0$.
  • Figure 3: Component diagrams for constructing two-dimensional sparse grids. A sparse grid is the union of all cartesian-product grids, $\mathcal{X}_{l_1}^{p_1}\times\cdots\times\mathcal{X}_{l_d}^{p_d}$, each corresponding to a multi-index $\boldsymbol{l}\in\mathcal{I}_L^{d}$. The highlighted components in (b) show the components of (a) mapped onto the isotropic component diagram.
  • Figure 4: Here we can see how a lengthscale-informed sparse grid, (b), is simply an isotropic sparse grid, (a), of the same level, $L$, stretched by the penalty vector, $\mathbf{p}$, where points outside the domain are then excluded. Since $p_2>p_1$, $f$ is assumed to be more sensitive in $x_1$, and so more points are placed in the horizontal direction.
  • Figure 5: Relative $L^2$-error in approximating realisations of $f$ when (a), interpolating with isotropic kernels ($\boldsymbol{\lambda}=\mathbf{1}$) on Monte Carlo (MC) and standard sparse grid (SG) designs, and (b), interpolating with anisotropic kernels ($\boldsymbol{\lambda}=2^\mathbf{p}$) on Monte Carlo (LIMC) and lengthscale-informed sparse grid designs, based on uniformly spaced-points (LISG), uniformly-spaced including boundary points (B-LISG), and Clenshaw-Curtis points (CC-LISG). We consider input dimensions 3 and 8, both linearly and logarithmically growing penalties, and employ separable Matérn kernels with $\nu_j=1.5$ for all $1\leq j \leq d$.
  • ...and 4 more figures

Theorems & Definitions (77)

  • Definition 1
  • Definition 2
  • Proposition 1: Section 6.2, Rasmussen2005, Theorem 1, Scholkopf2001
  • Proposition 2: Chapter 10, wendland_2004
  • Definition 3: Lord_Powell_Shardlow_2014
  • Proposition 3: Corollary 10.13, wendland_2004
  • Remark 1
  • Proposition 4: Theorem 10.47, wendland_2004
  • Proposition 5
  • Definition 4: See e.g. Teckentrup2020
  • ...and 67 more