Table of Contents
Fetching ...

Solving Approximation Tasks with Greedy Deep Kernel Methods

Marian Klink, Tobias Ehring, Robin Herkert, Robin Lautenschlager, Dominik Göddeke, Bernard Haasdonk

TL;DR

This work introduces deep, multilayer kernels for greedy approximation, and presents numerical investigations and comparisons with neural networks, which clearly show the advantages in terms of approximation accuracies.

Abstract

Kernel methods are versatile tools for function approximation and surrogate modeling. In particular, greedy techniques offer computational efficiency and reliability through inherent sparsity and provable convergence. Inspired by the success of deep neural networks and structured deep kernel networks, we consider deep, multilayer kernels for greedy approximation. This multilayer structure, consisting of linear kernel layers and optimizable kernel activation function layers in an alternating fashion, increases the expressiveness of the kernels and thus of the resulting approximants. Compared to standard kernels, deep kernels are able to adapt kernel intrinsic shape parameters automatically, incorporate transformations of the input space and induce a data-dependent reproducing kernel Hilbert space. For this, deep kernels need to be pretrained using a specifically tailored optimization objective. In this work, we not only introduce deep kernel greedy models, but also present numerical investigations and comparisons with neural networks, which clearly show the advantages in terms of approximation accuracies. As applications we consider the approximation of model problems, the prediction of breakthrough curves for reactive flow through porous media and the approximation of solutions for parameterized ordinary differential equation systems.

Solving Approximation Tasks with Greedy Deep Kernel Methods

TL;DR

This work introduces deep, multilayer kernels for greedy approximation, and presents numerical investigations and comparisons with neural networks, which clearly show the advantages in terms of approximation accuracies.

Abstract

Kernel methods are versatile tools for function approximation and surrogate modeling. In particular, greedy techniques offer computational efficiency and reliability through inherent sparsity and provable convergence. Inspired by the success of deep neural networks and structured deep kernel networks, we consider deep, multilayer kernels for greedy approximation. This multilayer structure, consisting of linear kernel layers and optimizable kernel activation function layers in an alternating fashion, increases the expressiveness of the kernels and thus of the resulting approximants. Compared to standard kernels, deep kernels are able to adapt kernel intrinsic shape parameters automatically, incorporate transformations of the input space and induce a data-dependent reproducing kernel Hilbert space. For this, deep kernels need to be pretrained using a specifically tailored optimization objective. In this work, we not only introduce deep kernel greedy models, but also present numerical investigations and comparisons with neural networks, which clearly show the advantages in terms of approximation accuracies. As applications we consider the approximation of model problems, the prediction of breakthrough curves for reactive flow through porous media and the approximation of solutions for parameterized ordinary differential equation systems.

Paper Structure

This paper contains 23 sections, 39 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Sketch of the greedy deep kernel approximation procedure: First, the inner centers and the trainable parameters are initialized. Second, the trainable parameters of the deep kernel are optimized by performing several stochastic batch optimization epochs. Third and final, the trained deep kernel is used inside the greedy algorithm (VKOGA), where in each greedy iteration a new greedy center is chosen and the final approximant is updated.
  • Figure 2: Sketch of a propagating function $F_{L-1}: \mathbb{R}^{d_0} \to \mathbb{R}^{d_L-1}$ corresponding to an $L$-layer deep kernel $k^{(L)}:\mathbb{R}^{d_0} \times \mathbb{R}^{d_0} \to \mathbb{R}^{d_L \times d_L}$. The input dimensions are marked in green, the hidden dimensions in blue and the output dimensions in red. The weight matrices $W_\ell$, $\ell$ odd, contain the optimizable parameters of the linear kernel layers. The weight matrices $A_\ell$, $\ell$ even, contain the optimizable parameters of the kernel activation layers.
  • Figure 3: Visualization of shallow and $4$-layer deep kernels based on the Gaussian kernel with $\epsilon=1$ and the linear Matérn kernel with $\epsilon=1$ from \ref{['tab:RBF_kernels']}. Left: $M=1$ inner first-layer center $z_1^{1} = 0$. Middle: $M=1$ inner first-layer center $z_1^{1} = 1$. Right: $M=3$ inner first-layer centers $z_1^{1}=-1.5$, $z_2^{1}=0.0$ and $z_3^{1}=1.5$.
  • Figure 4: Comparison between deep VKOGA models and NNs regarding the functions $f_2$ (left column), $f_3$ (middle column) and $f_4$ (right column). Top row: Mean relative test error evaluated on the fixed test dataset as a function of network depth (number of layers). The error bars indicate the $5$th-$95$th percentile range of relative test errors across the test dataset. The dashed lines correspond to the deep VKOGA models, the solid lines to the ReLU NNs. Middle row: Training loss, that is Rippa's loss in the case of deep VKOGA models and the MSE loss in the case of NNs, over the number of training epochs. Only every $10$-th loss evaluation is plotted to enhance visibility. Bottom row: Training residual norm over the number of selected greedy centers.
  • Figure 5: Mean relative test error of deep VKOGA models and NNs with a nearly constant number of trainable parameters evaluated on the fixed test dataset as a function of network depth (number of layers). The error bars indicate the $5$th-$95$th percentile range of relative test errors across the test dataset. The dashed lines correspond to the deep VKOGA models, the solid lines to the ReLU NNs.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition 1: Deep kernel
  • Remark 1
  • Remark 2
  • Example 1
  • Remark 3