Attraction-Repulsion Spectrum in Neighbor Embeddings
Jan Niklas Böhm, Philipp Berens, Dmitry Kobak
TL;DR
This paper reveals an attraction-repulsion spectrum for neighbor embeddings, showing that the balance between attractive $k$NN edges and global repulsion shapes how embeddings capture continuous versus discrete structure. By analyzing t-SNE with varying exaggeration $\rho$, and situating UMAP and ForceAtlas2 on this spectrum, it demonstrates that high attraction favors global continuity while high repulsion emphasizes cluster structure. The authors provide mathematical links to Laplacian Eigenmaps and show that UMAP’s negative sampling effectively reduces repulsion, placing it near moderate attraction on the spectrum; FA2 aligns with stronger attraction due to non-decaying attractive forces. Practically, this work guides method choice based on the data’s structure (trajectories vs clusters) and highlights how optimization tricks and sampling schemes influence embedding geometry.
Abstract
Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using $k$NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE using the exaggeration parameter yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures and yields higher $k$NN recall. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lie Laplacian Eigenmaps. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto the attraction-repulsion spectrum, and highlight the inherent trade-offs between them.
