Table of Contents
Fetching ...

ENS-t-SNE: Embedding Neighborhoods Simultaneously t-SNE

Jacob Miller, Vahan Huroyan, Raymundo Navarrete, Md Iqbal Hossain, Stephen Kobourov

TL;DR

ENS-t-SNE extends t-SNE to embed data in 3D while producing multiple 2D projections that preserve local neighborhood structure for different subspaces, enabling seamless cross-view interpretation. The method optimizes a summed objective $ ilde{C}$ across projections and learns projection matrices, allowing coherent transitions between views through 3D rotations. Empirical results on synthetic and real-world datasets show improved neighborhood preservation and stability in the projections compared to MPSE and standard t-SNE, highlighting its utility for multi-perspective data exploration. The approach provides a practical, extensible tool for revealing diverse patterns within the same high-dimensional data and includes public code and demonstrations for broader adoption.

Abstract

When visualizing a high-dimensional dataset, dimension reduction techniques are commonly employed which provide a single 2-dimensional view of the data. We describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously that generalizes the t-Stochastic Neighborhood Embedding approach. By using different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different types of clusters within the same high-dimensional dataset. This enables the viewer to see and keep track of the different types of clusters, which is harder to do when providing multiple 2D embeddings, where corresponding points cannot be easily identified. We illustrate the utility of ENS-t-SNE with real-world applications and provide an extensive quantitative evaluation with datasets of different types and sizes.

ENS-t-SNE: Embedding Neighborhoods Simultaneously t-SNE

TL;DR

ENS-t-SNE extends t-SNE to embed data in 3D while producing multiple 2D projections that preserve local neighborhood structure for different subspaces, enabling seamless cross-view interpretation. The method optimizes a summed objective across projections and learns projection matrices, allowing coherent transitions between views through 3D rotations. Empirical results on synthetic and real-world datasets show improved neighborhood preservation and stability in the projections compared to MPSE and standard t-SNE, highlighting its utility for multi-perspective data exploration. The approach provides a practical, extensible tool for revealing diverse patterns within the same high-dimensional data and includes public code and demonstrations for broader adoption.

Abstract

When visualizing a high-dimensional dataset, dimension reduction techniques are commonly employed which provide a single 2-dimensional view of the data. We describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously that generalizes the t-Stochastic Neighborhood Embedding approach. By using different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different types of clusters within the same high-dimensional dataset. This enables the viewer to see and keep track of the different types of clusters, which is harder to do when providing multiple 2D embeddings, where corresponding points cannot be easily identified. We illustrate the utility of ENS-t-SNE with real-world applications and provide an extensive quantitative evaluation with datasets of different types and sizes.
Paper Structure (18 sections, 22 equations, 20 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 22 equations, 20 figures, 1 table, 1 algorithm.

Figures (20)

  • Figure 1: The ENS-t-SNE embedding of a $400$-point dataset with three perspectives: in each perspective there are two different clusters. We have encoded each perspective's clusters using visual channels: color (orange or blue), shape (square or circle), and texture (filled or not filled). (a) shows the three dimensional embedding of the dataset. (b) shows the first view where data points are clustered by color, (c) shows the second view where points are clustered by shape, and (d) where points are clustered by texture.
  • Figure 2: ENS-t-SNE embedding of a clustered dataset, created according to Section \ref{['sec:cluster_construction']} where the number of perspectives is $M = 3$, the number of datapoints is $N = 1000$, and the number of clusters per perspective is $NC_1 = 2$, $NC_2 = 3$ and $NC_3 = 4$. Fig. \ref{['fig:clustering_234_400']}(a) shows a snapshot of the 3D ENS-t-SNE embedding and Figures \ref{['fig:clustering_234_400']}(b)-(d) show the 2D projections of the 3D ENS-t-SNE embedding. The original clusters are shown in texture (filled or not), shape (square, triangle and circle), and color (blue, orange, green, red). ENS-t-SNE is able to recover the clusters and create an embedding which respects all the different types of clusters.
  • Figure 3: MPSE applied to the synthetic data from Section \ref{['sec:cluster_construction']}. Recall that we should see clusters based on texture, shape, and color in the three views. MPSE fails in capturing this information by mixing clusters in the color and shape views. In general, these clusters are better separated in Fig \ref{['fig:clustering_234_400']}.
  • Figure 4: The Palmer's Penguins dataset captured by MPSE. (a): The full 3D embedding. (b): The projection capturing physical characteristics, encoded by color. (c): The embedding capturing penguin sex, encoded by shape. MPSE mixes the blue and orange clusters also squared and circled clusters.
  • Figure 5: The Palmer's Penguins dataset embedded by MDS, t-SNE, and UMAP in 3D. While these algorithms are not directly comparable to MPSE and ENS-t-SNE, they are provided for visual comparison. Embeddings of the remaining datasets by these algorithms are in supplemental materials.
  • ...and 15 more figures