Table of Contents
Fetching ...

An Exact Finite-dimensional Explicit Feature Map for Kernel Functions

Kamaledin Ghiasi-Shirazi, Mohammadreza Qaraei

TL;DR

This paper introduces an explicit, finite-dimensional feature map for any arbitrary kernel function that ensures the inner product of data points in the feature space equals the kernel function value, during both training and testing.

Abstract

Kernel methods in machine learning use a kernel function that takes two data points as input and returns their inner product after mapping them to a Hilbert space, implicitly and without actually computing the mapping. For many kernel functions, such as Gaussian and Laplacian kernels, the feature space is known to be infinite-dimensional, making operations in this space possible only implicitly. This implicit nature necessitates algorithms to be expressed using dual representations and the kernel trick. In this paper, given an arbitrary kernel function, we introduce an explicit, finite-dimensional feature map for any arbitrary kernel function that ensures the inner product of data points in the feature space equals the kernel function value, during both training and testing. The existence of this explicit mapping allows for kernelized algorithms to be formulated in their primal form, without the need for the kernel trick or the dual representation. As a first application, we demonstrate how to derive kernelized machine learning algorithms directly, without resorting to the dual representation, and apply this method specifically to PCA. As another application, without any changes to the t-SNE algorithm and its implementation, we use it for visualizing the feature space of kernel functions.

An Exact Finite-dimensional Explicit Feature Map for Kernel Functions

TL;DR

This paper introduces an explicit, finite-dimensional feature map for any arbitrary kernel function that ensures the inner product of data points in the feature space equals the kernel function value, during both training and testing.

Abstract

Kernel methods in machine learning use a kernel function that takes two data points as input and returns their inner product after mapping them to a Hilbert space, implicitly and without actually computing the mapping. For many kernel functions, such as Gaussian and Laplacian kernels, the feature space is known to be infinite-dimensional, making operations in this space possible only implicitly. This implicit nature necessitates algorithms to be expressed using dual representations and the kernel trick. In this paper, given an arbitrary kernel function, we introduce an explicit, finite-dimensional feature map for any arbitrary kernel function that ensures the inner product of data points in the feature space equals the kernel function value, during both training and testing. The existence of this explicit mapping allows for kernelized algorithms to be formulated in their primal form, without the need for the kernel trick or the dual representation. As a first application, we demonstrate how to derive kernelized machine learning algorithms directly, without resorting to the dual representation, and apply this method specifically to PCA. As another application, without any changes to the t-SNE algorithm and its implementation, we use it for visualizing the feature space of kernel functions.

Paper Structure

This paper contains 9 sections, 1 theorem, 39 equations, 4 figures.

Key Result

Theorem 1

Suppose $K$ is a kernel matrix over $N$ training data points $\{x_1, \ldots, x_N\}$. Consider the mapping $\phi: X \rightarrow \mathbb{R}^N$ defined as: . For every pair $(x_n, z)$, where $x_n$ is a training data point and $z$ is an arbitrary data point, we have:

Figures (4)

  • Figure 1: Visualization of the feature space using the inappropriate inner product kernel function $k_1$ for digits 2, 4, and 7 from the MNIST dataset. $k_1$ is an inner product kernel function with a degree of 9 applied to pixels with intensities in the range [0, 1]. Due to the presence of pixels with a value of zero, the similarity of a data point with itself can be very low.
  • Figure 2: Visualization of the feature space using the appropriate inner product kernel function $k_2$ for digits 2, 4, and 7 from the MNIST dataset. $k_2$ is an inner product kernel function applied to pixels with intensities in the range [-1, 1], and it has been adjusted to function similarly to an RBF kernel, ensuring the similarity of each data with itself is nearly one and the minimum similarity value is zero.
  • Figure 3: Extracted features for the test data using Fisher analysis for the inappropriate inner product kernel function $k_1$ applied to digits 2, 4, and 7 from the MNIST dataset. $k_1$ is a degree 9 inner product kernel function applied to pixels with intensities in the range [0, 1]. Due to the presence of pixels with a value of zero, the similarity of a data point with itself can be very low.
  • Figure 4: Extracted features for the test data using Fisher analysis for the appropriate inner product kernel function $k_2$ applied to digits 2, 4, and 7 from the MNIST dataset. $k_2$ is an inner product kernel function applied to pixels with intensities in the range [-1, 1]. It has been configured to mimic the behavior of an RBF (Radial Basis Function) kernel, ensuring that each image is highly similar to itself (similarity close to 1) and the minimum similarity value is zero.

Theorems & Definitions (2)

  • Theorem 1
  • proof