Table of Contents
Fetching ...

VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation

Pavlin G. Poličar, Blaž Zupan

TL;DR

Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding that are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.

Abstract

Two-dimensional embeddings obtained from dimensionality reduction techniques, such as MDS, t-SNE, and UMAP, are widely used across various disciplines to visualize high-dimensional data. These visualizations provide a valuable tool for exploratory data analysis, allowing researchers to visually identify clusters, outliers, and other interesting patterns in the data. However, interpreting the resulting visualizations can be challenging, as it often requires additional manual inspection to understand the differences between data points in different regions of the embedding space. To address this issue, we propose Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding. VERA produces informative explanations that characterize distinct regions in the embedding space, allowing users to gain an overview of the embedding landscape at a glance. Unlike most existing approaches, which typically require some degree of manual user intervention, VERA produces static explanations, automatically identifying and selecting the most informative visual explanations to show to the user. We illustrate the usage of VERA on a real-world data set and validate the utility of our approach with a comparative user study. Our results demonstrate that the explanations generated by VERA are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.

VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation

TL;DR

Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding that are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.

Abstract

Two-dimensional embeddings obtained from dimensionality reduction techniques, such as MDS, t-SNE, and UMAP, are widely used across various disciplines to visualize high-dimensional data. These visualizations provide a valuable tool for exploratory data analysis, allowing researchers to visually identify clusters, outliers, and other interesting patterns in the data. However, interpreting the resulting visualizations can be challenging, as it often requires additional manual inspection to understand the differences between data points in different regions of the embedding space. To address this issue, we propose Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding. VERA produces informative explanations that characterize distinct regions in the embedding space, allowing users to gain an overview of the embedding landscape at a glance. Unlike most existing approaches, which typically require some degree of manual user intervention, VERA produces static explanations, automatically identifying and selecting the most informative visual explanations to show to the user. We illustrate the usage of VERA on a real-world data set and validate the utility of our approach with a comparative user study. Our results demonstrate that the explanations generated by VERA are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.
Paper Structure (19 sections, 4 equations, 7 figures)

This paper contains 19 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: We visualize each feature of our fictional Bookworm data set in the typical scatter plot fashion. The point positions are specified by a two-dimensional embedding of the data set. Categorical variables are colored with a discrete colormap, while the two numeric variables are colored with a continuous colormap. Missing values are colored gray.
  • Figure 2: We illustrate the region construction process for a single variable using a synthetic example. (a) Two-dimensional embeddings are often inspected by coloring points by their corresponding feature values. However, the informativeness of this approach is often limited by the color scale which can be skewed in the presence of outliers. (b) We discretize the numeric values into five bins, resulting in five binary indicator variables. Here, points here are colored according to their discretization bin membership. (c) For each of the discretized bins, we extract regions by obtaining a contour from the corresponding KDE. (d) Finally, we merge ovelapping regions to obtain more informative and readable visualization.
  • Figure 3: The contrastive merge. If two base variables contain explanatory variables whose regions have a perfect overlap bipartite matching, we can merge the two base variables into one. This not only reduces the number of panels a user needs to inspect but also reveals correlations between particular variable values.
  • Figure 4: Contrastive layout generation. (a) Generating candidate panels simply involves including all explanatory variables associated with a particular base variable in a separate panel. Each candidate panel is scored and ranked according to three metrics, and a weighted mean rank is computed for each panel. (b) The top $k$ panels with the highest mean rank, indicated in red, are included in the resulting layout.
  • Figure 5: The descriptive merge. In this example, we consider a subset of three explanatory variables (a). In the first step, we split explanatory variable into disjoint regions (b). Then, we identify spatially overlapping explanatory variables and merge them into explanatory variable groups (c). In the final step, each explanatory variable group is imbued with background explanatory variables (d). In this instance, the middle panel depicting the pink region receives an additional, background descriptor, highlighted in red.
  • ...and 2 more figures