Measuring the Data
Ido Cohen
TL;DR
Measuring the Data addresses the challenge of analytically determining the intrinsic dimension $M$ of sparse, nonlinear data. It combines Optimal Transport to generate parametric curves on the data manifold and Koopman Regularization to derive a nonlinear mapping to the intrinsic coordinates, leveraging the fact that the tangent space at a data point is isomorphic to $\mathbb{R}^M$ and that the Koopman eigenfunction space is finite dimensional. The method yields a parsimonious dynamical representation via a minimal set of Koopman eigenfunctions and unit-velocity measurements, enabling data interpolation, compression, denoising, retrieval, and improved neural network interpretability. The results on illustrative examples demonstrate accurate recovery of intrinsic structure and practical utility across multiple data-processing tasks.
Abstract
Measuring the Data analytically finds the intrinsic manifold in big data. First, Optimal Transport generates the tangent space at each data point from which the intrinsic dimension is revealed. Then, the Koopman Dimensionality Reduction procedure derives a nonlinear transformation from the data to the intrinsic manifold. Measuring the data procedure is presented here, backed up with encouraging results.
