Preserving clusters and correlations: a dimensionality reduction method for exceptionally high global structure preservation
Jacob Gildenblat, Jens Pahnke
TL;DR
PCC introduces a global correlation objective to dimensionality reduction, explicitly aiming to preserve the global arrangement of data by aligning high- and low-dimensional distances to reference points. It couples this with a local structure objective based on clustering observability, learned via a linear classifier on the low-dimensional embedding. The method achieves state-of-the-art GS preservation, outperforms many existing DR approaches on multiple datasets, and can augment UMAP through PCUMAP or via initialization strategies, with demonstrated benefits in medical imaging contexts. The work provides a practical, simple framework for improving global fidelity in DR while maintaining competitive local clustering behavior, with clear implications for visualization and downstream analysis in life sciences and imaging.
Abstract
We present Preserving Clusters and Correlations (PCC), a novel dimensionality reduction (DR) method a novel dimensionality reduction (DR) method that achieves state-of-the-art global structure (GS) preservation while maintaining competitive local structure (LS) preservation. It optimizes two objectives: a GS preservation objective that preserves an approximation of Pearson and Spearman correlations between high- and low-dimensional distances, and an LS preservation objective that ensures clusters in the high-dimensional data are separable in the low-dimensional data. PCC has a state-of-the-art ability to preserve the GS while having competitive LS preservation. In addition, we show the correlation objective can be combined with UMAP to significantly improve its GS preservation with minimal degradation of the LS. We quantitatively benchmark PCC against existing methods and demonstrate its utility in medical imaging, and show PCC is a competitive DR technique that demonstrates superior GS preservation in our benchmarks.
