Quantum-inspired Benchmark for Estimating Intrinsic Dimension
Aritra Das, Joseph T. Iosue, Victor V. Albert
TL;DR
This work addresses the inconsistent intrinsic dimension (ID) estimates produced by existing IDE methods on real-world data by introducing QuIIEst, a quantum-inspired benchmark comprising infinite families of topologically non-trivial manifolds with known ground-truth IDs. Using Gilmore-Perelomov coherent-state embeddings, the authors generate homogeneous-space manifolds (e.g., Stiefel, Grassmannian, flag manifolds, and Pauli quotients) and even include non-manifold fractal examples like Hofstadter's butterfly to probe effective dimensionality. They evaluate six IDE methods across these manifolds, demonstrating that standard benchmarks often underrepresent difficulty and that embedding choices and distortions influence estimation accuracy; in particular, some manifolds are harder for IDEs than spheres with the same ground-truth dimension. The study further analyzes how data statistics and geometry relate to IDE performance and provides a scalable framework for future benchmarking, with plans to release datasets under CC BY 4.0 to advance reproducibility and comparative evaluation in intrinsic-dimension estimation.
Abstract
Machine learning models can generalize well on real-world datasets. According to the manifold hypothesis, this is possible because datasets lie on a latent manifold with small intrinsic dimension (ID). There exist many methods for ID estimation (IDE), but their estimates vary substantially. This warrants benchmarking IDE methods on manifolds that are more complex than those in existing benchmarks. We propose a Quantum-Inspired Intrinsic-dimension Estimation (QuIIEst) benchmark consisting of infinite families of topologically non-trivial manifolds with known ID. Our benchmark stems from a quantum-optical method of embedding arbitrary homogeneous spaces while allowing for curvature modification and additive noise. The IDE methods tested were generally less accurate on QuIIEst manifolds than on existing benchmarks under identical resource allocation. We also observe minimal performance degradation with increasingly non-uniform curvature, underscoring the benchmark's inherent difficulty. As a result of independent interest, we perform IDE on the fractal Hofstadter's butterfly and identify which methods are capable of extracting the effective dimension of a space that is not a manifold.
