Tensor Network-Constrained Kernel Machines as Gaussian Processes

Frederiek Wesel; Kim Batselier

Tensor Network-Constrained Kernel Machines as Gaussian Processes

Frederiek Wesel, Kim Batselier

TL;DR

This paper proves that the outputs of Canonical Polyadic Decomposition and Tensor Train-constrained kernel machines recover a Gaussian Process (GP), which it is shown how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters.

Abstract

Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.

Tensor Network-Constrained Kernel Machines as Gaussian Processes

TL;DR

Abstract

Paper Structure (19 sections, 10 theorems, 59 equations, 3 figures)

This paper contains 19 sections, 10 theorems, 59 equations, 3 figures.

Introduction
Background
Basis Function Approximation
Product Kernels
Tensor Networks
Tensor Network-Constrained Kernel Machines
tn-Constrained Kernel Machines as gp
Convergence Rates to the gp
Consequences for map Estimation
Numerical Experiments
gp Convergence
gp Behavior at Prediction
Related Work
Conclusion
Notation
...and 4 more sections

Key Result

Lemma 2.2

Consider the product kernel of def:product_kernel. Denote the basis functions and prior covariance of each factor $k^{(d)} (x_d,x_d')$ as ${\bm{\varphi}^{(d)}(x_d)\in\mathbb{R}^{M_d}}$ and $\bm{\Lambda}^{(d)}\in\mathbb{R}^{M_d\times M_d}$ respectively, then the basis functions and prior covariance o and

Figures (3)

Figure 1: Histograms of the empirical pdf of cpd (blue) and tt (orange) models specified in \ref{['thm:CPD', 'thm:TT']} evaluated at a random point as a function of model parameters $P$ for $D=16$. The black line is the pdf of the gp. Notice how tt converges faster to the gp for the same number of model parameters $P$.
Figure 2: Mean and standard deviation of the Cramér–von Mises statistic $W^2$ evaluated between the empirical cdf of cpd and tt models specified in \ref{['thm:CPD', 'thm:TT']} evaluated at $N=10$ random points as a function of model parameters $P$ for $D=2,4,8,16$. The two models are equivalent for $D=2$. Notice how tt converges faster to the gp as the dimensionality of the inputs $D$ increases.
Figure 3: Mean and standard deviation of the test rmse evaluated of cpd and tt models as a function of model parameters $P$ as well as their target gp. Notice how in the yacht dataset tt exhibits more gp behavior compared to cpd as $P$ increases. On the energy dataset both methods exhibit gp behavior already for ranks different than one, which explains why tt appears to be slower.

Theorems & Definitions (23)

Definition 2.1: Product kernel rasmussen_gaussian_2006
Lemma 2.2: Basis functions and prior covariances of product kernels
Definition 2.3: cpd hitchcock_expression_1927
Definition 2.4: tt oseledets_tensor-train_2011
Definition 2.5: cpd-constrained kernel machine
Definition 2.6: tt-constrained kernel machine
Theorem 3.1: gp limit of cpd-constrained kernel machine
proof
Theorem 3.2: gp limit of tt-constrained kernel machine
proof
...and 13 more

Tensor Network-Constrained Kernel Machines as Gaussian Processes

TL;DR

Abstract

Tensor Network-Constrained Kernel Machines as Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (23)