Table of Contents
Fetching ...

Visualizing Spatial Semantics of Dimensionally Reduced Text Embeddings

Wei Liu, Chris North, Rebecca Faust

TL;DR

A gradient-based method for visualizing the spatial semantics of dimensionally reduced text embeddings and designed a visualization system that incorporates spatial word clouds into the document projection space to illustrate the impactful text features.

Abstract

Dimension reduction (DR) can transform high-dimensional text embeddings into a 2D visual projection facilitating the exploration of document similarities. However, the projection often lacks connection to the text semantics, due to the opaque nature of text embeddings and non-linear dimension reductions. To address these problems, we propose a gradient-based method for visualizing the spatial semantics of dimensionally reduced text embeddings. This method employs gradients to assess the sensitivity of the projected documents with respect to the underlying words. The method can be applied to existing DR algorithms and text embedding models. Using these gradients, we designed a visualization system that incorporates spatial word clouds into the document projection space to illustrate the impactful text features. We further present three usage scenarios that demonstrate the practical applications of our system to facilitate the discovery and interpretation of underlying semantics in text projections.

Visualizing Spatial Semantics of Dimensionally Reduced Text Embeddings

TL;DR

A gradient-based method for visualizing the spatial semantics of dimensionally reduced text embeddings and designed a visualization system that incorporates spatial word clouds into the document projection space to illustrate the impactful text features.

Abstract

Dimension reduction (DR) can transform high-dimensional text embeddings into a 2D visual projection facilitating the exploration of document similarities. However, the projection often lacks connection to the text semantics, due to the opaque nature of text embeddings and non-linear dimension reductions. To address these problems, we propose a gradient-based method for visualizing the spatial semantics of dimensionally reduced text embeddings. This method employs gradients to assess the sensitivity of the projected documents with respect to the underlying words. The method can be applied to existing DR algorithms and text embedding models. Using these gradients, we designed a visualization system that incorporates spatial word clouds into the document projection space to illustrate the impactful text features. We further present three usage scenarios that demonstrate the practical applications of our system to facilitate the discovery and interpretation of underlying semantics in text projections.
Paper Structure (20 sections, 3 equations, 3 figures, 1 algorithm)

This paper contains 20 sections, 3 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: The pipeline for our system. In a forward pass, we first embed the documents into an high-dimensional (HD) space and then we project them to two dimensions with DR. Next, we perform a backward pass using Autodiff through the pipeline that calculates the gradients of the 2D embeddings with respect to the document words. Finally, the 2D embeddings and gradients are passed to the visualization system to create the visualizations.
  • Figure 2: Comparing DR algorithms on a collection of news articles about sports and tech. (a) shows the spatial word cloud for the DR space generated by MDS. (b) shows the spatial word cloud for the DR space generated by t-SNE. We see that t-SNE identifies a strong central topic for the sports articles ("tennis"). MDS still picks up on this central topic but shows increased focus on subtopics within the sports articles, e.g. "champion"
  • Figure 3: Spatial word clouds generated with attention values. (a) shows the attention-based cloud for the DR space generated by MDS. (b) shows the attention-based cloud for the DR space generated by t-SNE. Unlike the gradient-based clouds in \ref{['fig:case_study_bbc']}, the impactful words identified by attention remain constant between DR algorithms, failing to explain the impact of the DR on the space.