Compressibility Barriers to Neighborhood-Preserving Data Visualizations
Szymon Snoeck, Noah Bergam, Nakul Verma
TL;DR
This work reframes low-dimensional data visualization as neighborhood-preserving embeddings of the input neighborhood graph into metric spaces with doubling dimension $d$, parameterized by $\alpha$-preservation. It proves that for a typical $n$-node graph, the minimal required dimension scales as $\Theta(\log n)$, and even sparse graphs incur $\Omega(\log n/\log\log n)$; when embeddings are restricted to normed spaces, especially $\ell_2$, the barrier strengthens to $\Omega(n)$ for $\alpha=1$ and to phase-transition behavior for $\alpha>1$ tied to spectral properties. Even graphs with pronounced cluster structure (planted partitions) typically require $d=\Omega(\log n)$, indicating strong limits on constant-dimensional neighborhood preservation in practice. The results collectively reveal fundamental geometric barriers to faithful neighborhood preservation in low dimensions, with concrete bounds expressed via clique partitions, neighborhood partitions, and graph spectra, and they point to refined avenues (e.g., approximate preservation, structural graph statistics) for practical visualization methods.
Abstract
To what extent is it possible to visualize high-dimensional data in two- or three-dimensional plots? We reframe this question in terms of embedding $n$-vertex graphs (representing the neighborhood structure of the input points) into metric spaces of low doubling dimension $d$ in such a way that keeps neighbors close and non-neighbors far. This notion of neighbor preservation can be understood as a considerably weaker embedding constraint than near-isometry, yet it is similarly as demanding in terms of how the minimum required dimension scales with the number of points. We show that for an overwhelming fraction of graphs, $d = Θ(\log n)$ is both necessary and sufficient for neighbor preservation. Even sparse regular graphs, which represent more restricted neighborhood connectivity structures, typically require $d= Ω(\log n / \log\log n)$. The landscape changes dramatically when embedding into normed spaces: general graphs become exponentially harder to embed, requiring $d=Ω(n)$, while sparse regular graphs continue to admit $d = O(\log n)$. Finally, we study the implications of these results for visualizing data with intrinsic cluster structure. We show that graphs produced from a planted partition model with $k$ clusters on $n$ points typically require $d=Ω(\log n)$, even when the cluster structure is salient. These results challenge the aspiration that constant-dimensional visualizations can faithfully preserve neighborhood structure.
