Table of Contents
Fetching ...

t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections

Angelos Chatzimparmpas, Rafael M. Martins, Andreas Kerren

TL;DR

t-viSNE introduces an interactive, view-coordinated system to open the black box of t-SNE, enabling hyper-parameter exploration, global and local quality assessment, and interpretation of projection patterns through novel views such as the Shepard Heatmap, Density/Remaining Cost mappings, Neighborhood Preservation, and the Dimension Correlation tool. The approach is validated with hypothetical and real-data use cases in cancer and diabetes, and a comparative user study against Google's Embedding Projector demonstrates higher perceived support and similar efficiency. By making t-SNE’s internal factors (densities, costs, neighborhood preservation) visible and actionable, t-viSNE aims to improve trust, interpretability, and practical utility of t-SNE visualizations in high-dimensional data analysis. The work also discusses design choices, limitations, and avenues for future work to broaden applicability and enhance user experience in visual analytics for dimensionality reduction.

Abstract

t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.

t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections

TL;DR

t-viSNE introduces an interactive, view-coordinated system to open the black box of t-SNE, enabling hyper-parameter exploration, global and local quality assessment, and interpretation of projection patterns through novel views such as the Shepard Heatmap, Density/Remaining Cost mappings, Neighborhood Preservation, and the Dimension Correlation tool. The approach is validated with hypothetical and real-data use cases in cancer and diabetes, and a comparative user study against Google's Embedding Projector demonstrates higher perceived support and similar efficiency. By making t-SNE’s internal factors (densities, costs, neighborhood preservation) visible and actionable, t-viSNE aims to improve trust, interpretability, and practical utility of t-SNE visualizations in high-dimensional data analysis. The work also discusses design choices, limitations, and avenues for future work to broaden applicability and enhance user experience in visual analytics for dimensionality reduction.

Abstract

t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.

Paper Structure

This paper contains 23 sections, 4 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Visual inspection of t-SNE results with t-viSNE: (a) a panel for uploading data sets, choosing between two execution modes (grid search or a single set of parameters), and storing new (or loading previous) executions; (b) overview of the results with data-specific labels encoded with categorical colors; (c) the Shepard Heatmap of all pairwise distances; (d) the histogram with the Density and Remaining Cost distributions; (e) list of available projections, ranked by quality; (f) the main scatterplot view representing the Density of neighborhoods in the original high-dimensional space and the Remaining Cost of each point; (g) the Neighborhood Preservation bar chart/line plot; (h) control elements for the different interaction modes of the tool; (i) the visual mapping panel with a variety of options for the users such as an annotation tool for saving notes for multi-session analyses; (j) the Dimension Correlation bar chart visualizing the correlations between the data dimensions; and (k) the Adaptive PCP plot representing the most important dimensions.
  • Figure 2: Hyper-parameter exploration (presented in a dialog at the beginning of an analytical session), with 25 representative projections from a pool of 500 alternatives obtained through a grid search. Five quality metrics, plus their Quality Metrics Average (QMA), are also displayed to support the visual analysis. The thumbnails are sorted according to the QMA and ordered row-wise from top to bottom. The currently-selected projection is indicated by a red box (top row, third column).
  • Figure 3: The importance of the visual mapping of Density, using three $5$-D Gaussian clusters with varying standard deviations and slight overlap. (a) A simple linear projection using PCA shows the clusters' varying density. (b) A t-SNE projection shows all clusters with roughly the same size. (c) t-viSNE accurately shows the densities of the clusters (color-encoded) and helps us identify, for example, that clusters 2 and 3 are separate.
  • Figure 4: Investigation of a group of points from the well-known Iris data set Dua2017Machine. (a) The points' sizes indicate that a region in-between the species versicolor and virginica has the highest Remaining Cost. (b) The points have similar dimension values, but are classified as different species. (c) Neighborhood Preservation starts high (for close neighbors), but steadily decreases.
  • Figure 5: The Dimension Correlation tool. (a) Nearby points are projected to a user-drawn path, creating a user-induced ordering. Here 7, 3, 4, and so on are data instance IDs. (b) The user-induced ordering is compared to dimension-specific orderings using a correlation measure. (c) Results are shown in the lengths of bars, ordered by the absolute value of the correlation (with highest on top). Note that if the same polyline is drawn by the user in the opposite direction over a pattern, then the signs of the correlations change but not their magnitude.
  • ...and 7 more figures