Table of Contents
Fetching ...

Optimal Kernel Learning for Gaussian Process Models with High-Dimensional Input

Lulu Kang, Minshen Xu

TL;DR

This work tackles GP regression in high dimensions by learning an optimal kernel as a convex combination of low-dimensional kernels, effectively performing a functional ANOVA decomposition that yields an additive GP. The authors connect kernel learning to optimal design, proving a discrete optimal kernel exists and developing a forward-stepwise Fedorov-Wynn-type algorithm with a sparsity-promoting heredity principle. They provide a rigorous General Equivalence Theorem, an optimal weight update scheme, and practical tuning guidelines, together with a staged construction of basic kernels to manage combinatorial growth. Empirical studies on Michalewicz, Borehole, and satellite-drag problems show improved active-variable identification and predictive accuracy relative to MLE GP, local GP, and MRFA, highlighting the method’s interpretability and applicability to engineering surrogates in high dimensions.

Abstract

Gaussian process (GP) regression is a popular surrogate modeling tool for computer simulations in engineering and scientific domains. However, it often struggles with high computational costs and low prediction accuracy when the simulation involves too many input variables. For some simulation models, the outputs may only be significantly influenced by a small subset of the input variables, referred to as the ``active variables''. We propose an optimal kernel learning approach to identify these active variables, thereby overcoming GP model limitations and enhancing system understanding. Our method approximates the original GP model's covariance function through a convex combination of kernel functions, each utilizing low-dimensional subsets of input variables. Inspired by the Fedorov-Wynn algorithm from optimal design literature, we develop an optimal kernel learning algorithm to determine this approximation. We incorporate the effect heredity principle, a concept borrowed from the field of ``design and analysis of experiments'', to ensure sparsity in active variable selection. Through several examples, we demonstrate that the proposed method outperforms alternative approaches in correctly identifying active input variables and improving prediction accuracy. It is an effective solution for interpreting the surrogate GP regression and simplifying the complex underlying system.

Optimal Kernel Learning for Gaussian Process Models with High-Dimensional Input

TL;DR

This work tackles GP regression in high dimensions by learning an optimal kernel as a convex combination of low-dimensional kernels, effectively performing a functional ANOVA decomposition that yields an additive GP. The authors connect kernel learning to optimal design, proving a discrete optimal kernel exists and developing a forward-stepwise Fedorov-Wynn-type algorithm with a sparsity-promoting heredity principle. They provide a rigorous General Equivalence Theorem, an optimal weight update scheme, and practical tuning guidelines, together with a staged construction of basic kernels to manage combinatorial growth. Empirical studies on Michalewicz, Borehole, and satellite-drag problems show improved active-variable identification and predictive accuracy relative to MLE GP, local GP, and MRFA, highlighting the method’s interpretability and applicability to engineering surrogates in high dimensions.

Abstract

Gaussian process (GP) regression is a popular surrogate modeling tool for computer simulations in engineering and scientific domains. However, it often struggles with high computational costs and low prediction accuracy when the simulation involves too many input variables. For some simulation models, the outputs may only be significantly influenced by a small subset of the input variables, referred to as the ``active variables''. We propose an optimal kernel learning approach to identify these active variables, thereby overcoming GP model limitations and enhancing system understanding. Our method approximates the original GP model's covariance function through a convex combination of kernel functions, each utilizing low-dimensional subsets of input variables. Inspired by the Fedorov-Wynn algorithm from optimal design literature, we develop an optimal kernel learning algorithm to determine this approximation. We incorporate the effect heredity principle, a concept borrowed from the field of ``design and analysis of experiments'', to ensure sparsity in active variable selection. Through several examples, we demonstrate that the proposed method outperforms alternative approaches in correctly identifying active input variables and improving prediction accuracy. It is an effective solution for interpreting the surrogate GP regression and simplifying the complex underlying system.

Paper Structure

This paper contains 23 sections, 7 theorems, 51 equations, 8 figures, 8 tables, 3 algorithms.

Key Result

Proposition 1

The directional derivative of $Q_{\eta}(\xi)$ in the direction of $\xi^{\prime}$ is given as, where $\bm K_{\xi}$ and $\bm K_{\xi'}$ are the $n\times n$ kernel matrices computed by evaluating $K(\xi)$ and $K(\xi')$ on $\mathcal{X}=\{\bm x_i\}_{i=1}^n$.

Figures (8)

  • Figure 1: Leave-one-out cross-validation error w.r.t. $\eta$.
  • Figure 2: The true (left) and predicted surface by optK (right) of the Michalewicz function.
  • Figure 3: Michalewicz function: boxplots and dotplot of standard RMSEs for $p=2$ and $d=6$.
  • Figure 4: Michalewicz function: boxplots of standard RMSEs for $p=6$, $d=60$, and $n=300,500$.
  • Figure 5: Borehole function: boxplots of standard RMSEs for $d=60$ and $n=200, 500$.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 1: Directional Derivative w.r.t. Kernel
  • Definition 2: Directional Derivative w.r.t. Design
  • Proposition 1
  • Lemma 1
  • Theorem 1: General Equivalence Theorem
  • Lemma 2
  • Theorem 2: Convergence of Algorithm \ref{['alg:fed-wynn']}
  • Corollary 1: Conditions of Optimal Weights
  • Theorem 3
  • proof
  • ...and 4 more