Table of Contents
Fetching ...

Tensor Network-Constrained Kernel Machines as Gaussian Processes

Frederiek Wesel, Kim Batselier

TL;DR

This paper proves that the outputs of Canonical Polyadic Decomposition and Tensor Train-constrained kernel machines recover a Gaussian Process (GP), which it is shown how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters.

Abstract

Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.

Tensor Network-Constrained Kernel Machines as Gaussian Processes

TL;DR

This paper proves that the outputs of Canonical Polyadic Decomposition and Tensor Train-constrained kernel machines recover a Gaussian Process (GP), which it is shown how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters.

Abstract

Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.
Paper Structure (19 sections, 10 theorems, 59 equations, 3 figures)

This paper contains 19 sections, 10 theorems, 59 equations, 3 figures.

Key Result

Lemma 2.2

Consider the product kernel of def:product_kernel. Denote the basis functions and prior covariance of each factor $k^{(d)} (x_d,x_d')$ as ${\bm{\varphi}^{(d)}(x_d)\in\mathbb{R}^{M_d}}$ and $\bm{\Lambda}^{(d)}\in\mathbb{R}^{M_d\times M_d}$ respectively, then the basis functions and prior covariance o and

Figures (3)

  • Figure 1: Histograms of the empirical pdf of cpd (blue) and tt (orange) models specified in \ref{['thm:CPD', 'thm:TT']} evaluated at a random point as a function of model parameters $P$ for $D=16$. The black line is the pdf of the gp. Notice how tt converges faster to the gp for the same number of model parameters $P$.
  • Figure 2: Mean and standard deviation of the Cramér–von Mises statistic $W^2$ evaluated between the empirical cdf of cpd and tt models specified in \ref{['thm:CPD', 'thm:TT']} evaluated at $N=10$ random points as a function of model parameters $P$ for $D=2,4,8,16$. The two models are equivalent for $D=2$. Notice how tt converges faster to the gp as the dimensionality of the inputs $D$ increases.
  • Figure 3: Mean and standard deviation of the test rmse evaluated of cpd and tt models as a function of model parameters $P$ as well as their target gp. Notice how in the yacht dataset tt exhibits more gp behavior compared to cpd as $P$ increases. On the energy dataset both methods exhibit gp behavior already for ranks different than one, which explains why tt appears to be slower.

Theorems & Definitions (23)

  • Definition 2.1: Product kernel rasmussen_gaussian_2006
  • Lemma 2.2: Basis functions and prior covariances of product kernels
  • Definition 2.3: cpd hitchcock_expression_1927
  • Definition 2.4: tt oseledets_tensor-train_2011
  • Definition 2.5: cpd-constrained kernel machine
  • Definition 2.6: tt-constrained kernel machine
  • Theorem 3.1: gp limit of cpd-constrained kernel machine
  • proof
  • Theorem 3.2: gp limit of tt-constrained kernel machine
  • proof
  • ...and 13 more