Table of Contents
Fetching ...

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

Frederiek Wesel, Kim Batselier

TL;DR

The paper tackles the prohibitive growth of feature spaces in kernel methods that use polynomial and Fourier features. It introduces a quantized tensor-network framework (QTKM) that exacts a further tensorization of these features and jointly quantizes the model weights, enabling increased expressiveness without extra computational cost. The authors prove that the quantized models admit higher potential VC-dimension bounds and demonstrate through diverse experiments that they generalize better in underparameterized regimes and perform strongly on large-scale tasks such as airline regression on commodity hardware. This work offers a scalable route to more expressive nonlinear kernel models and suggests that quantization acts as an effective regularizer by prioritizing salient spectral interactions. The approach is compatible with common tensor-network architectures (CPD/TT/TR) and can be extended to other TN-based learning methods.

Abstract

In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

TL;DR

The paper tackles the prohibitive growth of feature spaces in kernel methods that use polynomial and Fourier features. It introduces a quantized tensor-network framework (QTKM) that exacts a further tensorization of these features and jointly quantizes the model weights, enabling increased expressiveness without extra computational cost. The authors prove that the quantized models admit higher potential VC-dimension bounds and demonstrate through diverse experiments that they generalize better in underparameterized regimes and perform strongly on large-scale tasks such as airline regression on commodity hardware. This work offers a scalable route to more expressive nonlinear kernel models and suggests that quantization acts as an effective regularizer by prioritizing salient spectral interactions. The approach is compatible with common tensor-network architectures (CPD/TT/TR) and can be extended to other TN-based learning methods.

Abstract

In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.
Paper Structure (22 sections, 11 theorems, 38 equations, 5 figures, 2 tables)

This paper contains 22 sections, 11 theorems, 38 equations, 5 figures, 2 tables.

Key Result

Theorem 2.3

Suppose $\operatorname{\ff{ten}}\left(\bm{w}, M_1, M_2, \ldots, M_D\right)$ is a tensor in CPD, TT or TR form. Then model responses and associated gradients can be computed in $\mathcal{O}(P)$ instead of $\mathcal{O}(\prod_{d=1}^D M_d)$, where $P=DMR$ in case of CPD, and $P=DMR^2$ in case of TT or TR.

Figures (5)

  • Figure 1: TKM with TT-constrained weights.
  • Figure 2: Corresponding QTKM with $Q$-quantized TT-constrained weights.
  • Figure 4: Plots of the test mean squared error as a function of the number of model parameters $P$, for different real-life datasets. In blue, random Fourier features rahimi_random_2007, in red tensorized kernel machines with Fourier features wahls_learning_2014stoudenmire_supervised_2016kargas_supervised_2021wesel_large-scale_2021, in yellow quantized kernel machines with Fourier features, with quantization $Q=2$. The gray horizontal full line is the full unconstrained optimization problem, which corresponds to kernel ridge regression (KRR). The grey vertical dotted line is set at $P=N$. It can be seen that for $P<N$ case, quantization allows to achieve better generalization performance with respect to the non-quantized case.
  • Figure 5: Sound dataset. In red, plot of the magnitude of the quantized Fourier coefficients for different values of $R$ and total number of model parameters $P$. The magnitude of the full unconstrained Fourier coefficients is shown in black. It can be observed that increasing the CPD rank $R$ recovers the peaks of frequencies with the highest magnitude.
  • Figure 6: Plots of the train mean squared error as a function of the number of model parameters $P$, for different real-life datasets. In blue, random Fourier features rahimi_random_2007, in red tensorized kernel machines with Fourier features wahls_learning_2014stoudenmire_supervised_2016kargas_supervised_2021wesel_large-scale_2021, in yellow quantized kernel machines with Fourier features, with quantization $Q=2$. The gray horizontal full line is the full unconstrained optimization problem, which corresponds to kernel ridge regression (KRR). The grey vertical dotted line is set at $P=N$. It can be seen that for $P<N$ case, quantization allows to achieve better performance with respect to the non-quantized case on the training set (this figure) and on the test set (\ref{['fig:generalization']}).

Theorems & Definitions (25)

  • Definition 2.1: Canonical polyadic decomposition hitchcock_expression_1927kolda_tensor_2009
  • Definition 2.2: Tensor train oseledets_tensor-train_2011
  • Theorem 2.3: Tensorized kernel machine (TKM)
  • proof
  • Definition 3.1: Pure-power polynomial feature map chen_parallelized_2018
  • Definition 3.2
  • Definition 3.3: Quantized Vandermonde vector
  • Theorem 3.4: Quantized pure-power-$(M_d-1)$ polynomial feature map
  • proof
  • Corollary 3.5: Quantized pure-power polynomials
  • ...and 15 more