A general framework for adaptive nonparametric dimensionality reduction
Antonio Di Noia, Federico Ravenda, Antonietta Mira
TL;DR
The paper addresses the challenge of hyper-parameter tuning in neighborhood-based nonparametric dimensionality reduction by introducing ABIDE, an intrinsic-dimension estimator that jointly outputs a global dimension $d^*$ and per-point neighborhood sizes $k^*$. These outputs yield locally uniform neighborhoods that adapt to data geometry and are integrated into LLE, spectral clustering, and UMAP to create adaptive variants (LLE^*, SC^*, UMAP^*) with improved unsupervised embeddings and supervised classification performance. ABIDE is shown to be consistent and asymptotically normal, providing a solid theoretical foundation, and experiments on real and synthetic datasets (Iris, MNIST, Manifolds, News Articles) demonstrate robust improvements in clustering metrics and visualization while reducing the need for manual hyper-parameter tuning. Overall, the framework offers a principled, general approach to adaptive nonparametric dimensionality reduction with broad applicability and practical impact for data visualization and learning tasks.
Abstract
Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based on local neighbourhood structures and require tuning the number of neighbours that define this local structure, and the dimensionality of the lower-dimensional space onto which the data are projected. Such choices critically influence the quality of the resulting embedding. In this paper, we exploit a recently proposed intrinsic dimension estimator which also returns the optimal locally adaptive neighbourhood sizes according to some desirable criteria. In principle, this adaptive framework can be employed to perform an optimal hyper-parameter tuning of any dimensionality reduction algorithm that relies on local neighbourhood structures. Numerical experiments on both real-world and simulated datasets show that the proposed method can be used to significantly improve well-known projection methods when employed for various learning tasks, with improvements measurable through both quantitative metrics and the quality of low-dimensional visualizations.
