Table of Contents
Fetching ...

A Novel Approach for Intrinsic Dimension Estimation

Kadir Özçoban, Murat Manguoğlu, Emrullah Fatih Yetkin

TL;DR

This work tackles intrinsic-dimension estimation for efficient dimensionality reduction in large, non-linear data. It introduces a novel pipeline that avoids full eigen-decomposition by combining $tr(\

Abstract

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality. Finding the nearly optimal representation of the dataset in a lower-dimensional space (i.e. dimensionality reduction) offers an applicable mechanism for improving the success of machine learning tasks. However, estimating the required data dimension for the nearly optimal representation (intrinsic dimension) can be very costly, particularly if one deals with big data. We propose a highly efficient and robust intrinsic dimension estimation approach that only relies on matrix-vector products for dimensionality reduction methods. An experimental study is also conducted to compare the performance of proposed method with state of the art approaches.

A Novel Approach for Intrinsic Dimension Estimation

TL;DR

This work tackles intrinsic-dimension estimation for efficient dimensionality reduction in large, non-linear data. It introduces a novel pipeline that avoids full eigen-decomposition by combining $tr(\

Abstract

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality. Finding the nearly optimal representation of the dataset in a lower-dimensional space (i.e. dimensionality reduction) offers an applicable mechanism for improving the success of machine learning tasks. However, estimating the required data dimension for the nearly optimal representation (intrinsic dimension) can be very costly, particularly if one deals with big data. We propose a highly efficient and robust intrinsic dimension estimation approach that only relies on matrix-vector products for dimensionality reduction methods. An experimental study is also conducted to compare the performance of proposed method with state of the art approaches.

Paper Structure

This paper contains 19 sections, 24 equations, 6 figures, 1 table, 4 algorithms.

Figures (6)

  • Figure 1: Required random vector counts based on different $\epsilon$ and $\delta$ values where $\epsilon$ is the worst case relative error which is guaranteed with a probability of 1-$\delta$
  • Figure 2: The CGLS Algorithm is applied to the toy problem given in Section \ref{['mat']}. The Eigenvalues / Ritz Values obtained after each iteration are given, and on the rightmost edge of the figure the exact eigenvalues of the covariance matrix are shown
  • Figure 3: Effect of the number of the Chebyshev polynomials (p) and the Ritz value counts on the total variance estimation where $\epsilon$ and $\delta$ are fixed (consequently the $n_v$ is also fixed). An acceptable range of the variance is given as a shaded area in the figure
  • Figure 4: Effect of the $\epsilon$ and $\delta$ (consequently the $n_v$) on the total variance estimation where p and Ritz values are fixed. An acceptable range of the variance is given as a shaded area in the figure
  • Figure 5: Estimated ID has been obtained for every $\epsilon$-$\delta$ pair 10 times and their statistical distributions are presented where gray disk, red line, light red box, and blue line correspond to the result of an individual run, the median, the interquartile range, and non-outlier range of the corresponding runs, respectively
  • ...and 1 more figures