Table of Contents
Fetching ...

nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces

Richard Connor, Lucia Vadicamo

Abstract

Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space. Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces. Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances. We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.

nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces

Abstract

Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space. Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces. Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances. We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.
Paper Structure (55 sections, 2 theorems, 30 equations, 21 figures, 3 tables, 2 algorithms)

This paper contains 55 sections, 2 theorems, 30 equations, 21 figures, 3 tables, 2 algorithms.

Key Result

Lemma C.1

Let $\Sigma_{\text{Base}}\in \mathbb{R}^{n\times n-1}$ representing a $(n-1)$-dimensional simplex of vertices $\Sigma_{\text{Base}}[i]\in \ell_2^{n-1}$, with $\Sigma_{\text{Base}}[i][j]=0$ for all $j\geq i$ and $\Sigma_{\text{Base}}[n][n-1]\geq0$. Let $\textbf{v}_i$ the corresponding vertices in $\e

Figures (21)

  • Figure 1: In $n$ dimensions, for fixed $\textbf{a}$ and $\textbf{b}$ within a given plane, $\textbf{c}$ is sampled from within the same plane at a fixed radius $r$ from $\textbf{b}$. As the dimensionality of the space increases, the probability of $\theta$ being close to $\pi/ 2$ increases rapidly: the right-hand plot shows probability density functions for various dimensions as $t=r\cos \theta$ varies between $-r$ and $r$.
  • Figure 2: Example projection from 3D to 2D using nSimplex. The left figure shows some generated points roughly in a 3D spiral pattern. Two of these points (depicted with red triangles) have been randomly selected to form the reference set $\mathcal{R}$. The right figure shows the 2D projection, formed over a 1D simplex derived from the distance between these points, whose vertices are shown in red. Each other point from the 3D set has been plotted at the apex of the triangle formed from its distances to these two points.
  • Figure 3: Two-dimensional projection of two values based on two reference objects (\ref{['fig_triangles_a']}), and the two possible planar tetrahedra formed by all four objects (\ref{['fig_triangles_b']}).
  • Figure 4: The Zen function is defined when the angle between the two triangles is set at $\pi/2$ in the hypothetical further dimension. There is no requirement to calculate a projection in this dimension: $\textit{Zen}(\sigma(u_1),\sigma(u_2)) = \ell_2(\sigma(u_1),\sigma^\theta(u_2))$.
  • Figure 5: Shepard plots for the reduction transforms, each having reduced 100-dimensional generated data to 80 dimensions. For 50 randomly selected values, all pairwise distances are plotted in both original and transformed spaces. The Y-axis represents true distance, and the X-axis is the distance measured in the reduced space. The solid black line shows the fitted least-squares monotonic regression function from which the Kruskal stress ($S_K$) is measured. It can be seen that nSimplexZen and RP point clouds are centred around the true distance function ($y =x$, the dashed line), whereas PCA is a contraction mapping. While MDS gives the appearance of a contraction mapping, in fact this is not a guarantee.
  • ...and 16 more figures

Theorems & Definitions (4)

  • Lemma C.1: Correctness of the ApexAddition algorithm
  • proof
  • Lemma C.2: n-Simplex Distance Constraint
  • proof