Table of Contents
Fetching ...

Chart2Vec: A Universal Embedding of Context-Aware Visualizations

Qing Chen, Ying Chen, Ruishi Zou, Wei Shuai, Yi Guo, Jiazhe Wang, Nan Cao

TL;DR

Chart2Vec introduces a context-aware universal embedding for visualizations by learning from declarative chart facts and their contextual co-occurrence in multi-view visualizations. It combines a CFG-based structural representation with Word2Vec-based semantic encoding, learned through a multi-task loss that fuses linear interpolation and triplet learning across four-chart inputs. The approach is evaluated with a large, carefully curated dataset of 849 data stories and 249 dashboards (6014 visualizations) and shows improvements over ChartSeer and Erato in retrieval and co-occurrence tasks, supported by a user study and ablation analyses. The work enables downstream tasks such as visualization recommendation and storytelling and suggests generalizability to other formats and real-world BI tools, marking a step toward scalable context-aware visualization intelligence.

Abstract

The advances in AI-enabled techniques have accelerated the creation and automation of visualizations in the past decade. However, presenting visualizations in a descriptive and generative format remains a challenge. Moreover, current visualization embedding methods focus on standalone visualizations, neglecting the importance of contextual information for multi-view visualizations. To address this issue, we propose a new representation model, Chart2Vec, to learn a universal embedding of visualizations with context-aware information. Chart2Vec aims to support a wide range of downstream visualization tasks such as recommendation and storytelling. Our model considers both structural and semantic information of visualizations in declarative specifications. To enhance the context-aware capability, Chart2Vec employs multi-task learning on both supervised and unsupervised tasks concerning the cooccurrence of visualizations. We evaluate our method through an ablation study, a user study, and a quantitative comparison. The results verified the consistency of our embedding method with human cognition and showed its advantages over existing methods.

Chart2Vec: A Universal Embedding of Context-Aware Visualizations

TL;DR

Chart2Vec introduces a context-aware universal embedding for visualizations by learning from declarative chart facts and their contextual co-occurrence in multi-view visualizations. It combines a CFG-based structural representation with Word2Vec-based semantic encoding, learned through a multi-task loss that fuses linear interpolation and triplet learning across four-chart inputs. The approach is evaluated with a large, carefully curated dataset of 849 data stories and 249 dashboards (6014 visualizations) and shows improvements over ChartSeer and Erato in retrieval and co-occurrence tasks, supported by a user study and ablation analyses. The work enables downstream tasks such as visualization recommendation and storytelling and suggests generalizability to other formats and real-world BI tools, marking a step toward scalable context-aware visualization intelligence.

Abstract

The advances in AI-enabled techniques have accelerated the creation and automation of visualizations in the past decade. However, presenting visualizations in a descriptive and generative format remains a challenge. Moreover, current visualization embedding methods focus on standalone visualizations, neglecting the importance of contextual information for multi-view visualizations. To address this issue, we propose a new representation model, Chart2Vec, to learn a universal embedding of visualizations with context-aware information. Chart2Vec aims to support a wide range of downstream visualization tasks such as recommendation and storytelling. Our model considers both structural and semantic information of visualizations in declarative specifications. To enhance the context-aware capability, Chart2Vec employs multi-task learning on both supervised and unsupervised tasks concerning the cooccurrence of visualizations. We evaluate our method through an ablation study, a user study, and a quantitative comparison. The results verified the consistency of our embedding method with human cognition and showed its advantages over existing methods.
Paper Structure (32 sections, 4 equations, 5 figures, 3 tables)

This paper contains 32 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Distribution of datasets, data stories and dashboards in different domains.
  • Figure 2: The formulation details of an example chart fact: (1) the graphical presentation of the visualization data, (2) the example chart fact representation, (3) the fact schema which shows structural information in the chart fact, (4) the fact semantics which indicates semantics information in the chart fact, and (5) the location of the fields in the chart fact where stores semantic information.
  • Figure 3: Formulation of Chart2Vec. During model training, the inputs are a set of four visualizations, which are passed through the Chart2Vec model with shared parameters. The two loss functions are used to jointly optimize the model parameters.
  • Figure 4: Architecture of the Chart2Vec model.
  • Figure 5: The construction of the user study training dataset. We selected an example dataset and represented all its related charts in this figure. Each chart is represented as a node and the charts from different multi-view visualizations are marked in different colors. Assuming $A_3$ as the anchor chart, we calculate its distance from the other charts, respectively. Two charts $A_2$ and $B_3$ fall in the range of the 15% nearest charts shown in the filled circle ❶. We randomly select one as one candidate $Cand_1$. Another two charts $A_1$ and $B_1$, locate in the second range shown in the filled circle ❷ and we also randomly select one as the other candidate $Cand_2$.