Table of Contents
Fetching ...

Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning

Luca Candelori, Alexander G. Abanov, Jeffrey Berger, Cameron J. Hogan, Vahagn Kirakosyan, Kharen Musaelian, Ryan Samson, James E. T. Smith, Dario Villani, Martin T. Wells, Mengjia Xu

TL;DR

A new data representation method based on Quantum Cognition Machine Learning and applied to manifold learning, specifically to the estimation of intrinsic dimension of data sets, which is shown to be robust with respect to the introduction of point-wise Gaussian noise.

Abstract

We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, we then construct from the quantum states a point cloud equipped with a quantum metric. The metric exhibits a spectral gap whose location corresponds to the intrinsic dimension of the data. The proposed estimator is based on the detection of this spectral gap. When tested on synthetic manifold benchmarks, our estimates are shown to be robust with respect to the introduction of point-wise Gaussian noise. This is in contrast to current state-of-the-art estimators, which tend to attribute artificial ``shadow dimensions'' to noise artifacts, leading to overestimates. This is a significant advantage when dealing with real data sets, which are inevitably affected by unknown levels of noise. We show the applicability and robustness of our method on real data, by testing it on the ISOMAP face database, MNIST, and the Wisconsin Breast Cancer Dataset.

Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning

TL;DR

A new data representation method based on Quantum Cognition Machine Learning and applied to manifold learning, specifically to the estimation of intrinsic dimension of data sets, which is shown to be robust with respect to the introduction of point-wise Gaussian noise.

Abstract

We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, we then construct from the quantum states a point cloud equipped with a quantum metric. The metric exhibits a spectral gap whose location corresponds to the intrinsic dimension of the data. The proposed estimator is based on the detection of this spectral gap. When tested on synthetic manifold benchmarks, our estimates are shown to be robust with respect to the introduction of point-wise Gaussian noise. This is in contrast to current state-of-the-art estimators, which tend to attribute artificial ``shadow dimensions'' to noise artifacts, leading to overestimates. This is a significant advantage when dealing with real data sets, which are inevitably affected by unknown levels of noise. We show the applicability and robustness of our method on real data, by testing it on the ISOMAP face database, MNIST, and the Wisconsin Breast Cancer Dataset.
Paper Structure (15 sections, 13 equations, 7 figures, 1 algorithm)

This paper contains 15 sections, 13 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Two configurations are shown for a data set $X$ consisting of $T=2500$ points uniformly distributed on the unit sphere with different levels of noise. (a,c) Scatter plot of the point cloud $X_A$ for (a) noise = 0, and (c) noise = 0.2, for two corresponding matrix configurations $A$ trained with Hilbert space dimension $N=3$. The original dataset is overlayed in red. Darker points correspond to lower relative error energy $E_0$. (b,d) Spectral gaps for (b) noise = 0 and (d) noise = 0.2. The $x$-axis corresponds to points $x\in X_A$ and on the $y$-axis the eigenvalues of the quantum metric $g(x)$ are plotted.
  • Figure 2: Intrinsic dimension estimates for the unit sphere $S^2$ as a function of noise level. Varying data set sizes of (a) $T=250$, (b) $T=2500$, (c) $T=25000$ points are tested. For the QCML estimator, the average estimate across all $T$ points is plotted. A slight degradation in the estimate for the QCML estimator is noticeable for noise > 0.15, especially in the case of $T=250$, but it is otherwise robust when compared to other methods.
  • Figure 3: Intrinsic dimension estimates for $T=2500$ points on three higher-dimensional benchmark manifoldsCampadelli: the 17-dimensional hypercube $M10b$, the 10-dimensional $M_{\beta}$ manifold embedded into $D=40$ dimensions, and the 18-dimensional manifold $MN_1$ embedded non-linearly into $D=72$ dimensions. In the boxplots (a-c) the $i$-th box represents the distribution of the eigenvalue $e_i$ across all $T=2500$ points. The outliers have been omitted from the plot for clarity. The plots (d-f) show the intrinsic dimension estimates for each manifold as functions of the noise parameter. In these examples a global estimate of dimension for the QCML estimator was obtained by taking the median of the local dimension estimates.
  • Figure 4: (a) Examples of images from the ISOMAP face database, (b) Spectral gap for ISOMAP, (c) Examples of digit "1" in the MNIST, (d) Spectral gap for MNIST digit "1", (e) Distribution of dimension estimates for MNIST digit "1". For ISOMAP, the distribution of dimension estimates is concentrated in dimension $d=3$, so the histogram is not shown.
  • Figure 5: Intrinsic dimension estimates for the Wisconsin Breast Cancer Dataset using a QCML estimator of dimension $N=16$ and quantum fluctuation weight $w=0.1$ in the loss function. (a) Spectral gap with zero noise. Outliers omitted for clarity. (b) Intrinsic dimension estimates as for different estimators as function of noise. For the QCML estimator, a global estimate of dimension is obtained by taking the mode of the local estimates.
  • ...and 2 more figures