Table of Contents
Fetching ...

DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

TL;DR

DataVisT5 addresses the challenge of jointly understanding text and data visualizations by introducing a DV-specialized pre-trained language model built on the T5 architecture. It couples a hybrid pre-training regime (Bidirectional Dual-Corpus objectives and MLM) with unified DV knowledge encoding and standardized DV query representations, followed by multi-task fine-tuning across text-to-vis, vis-to-text, FeVisQA, and table-to-text. The model leverages cross-modal data from NVBench, Chart2Text, WikiTableText, and FeVisQA to learn aligned text-DV semantics, achieving state-of-the-art performance across all four DV tasks and demonstrating robustness in cross-domain settings. This work offers a scalable path to more capable, cross-modal PLMs for data visualization and analytics, with potential impact on both research and practical DV tooling.

Abstract

Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce DataVisT5, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy, integrating text and DV datasets to effectively interpret cross-modal semantics. Extensive evaluations on public datasets show that DataVisT5 consistently outperforms current state-of-the-art models on various DV-related tasks. We anticipate that DataVisT5 will not only inspire further research on vertical PLMs but also expand the range of applications for PLMs.

DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

TL;DR

DataVisT5 addresses the challenge of jointly understanding text and data visualizations by introducing a DV-specialized pre-trained language model built on the T5 architecture. It couples a hybrid pre-training regime (Bidirectional Dual-Corpus objectives and MLM) with unified DV knowledge encoding and standardized DV query representations, followed by multi-task fine-tuning across text-to-vis, vis-to-text, FeVisQA, and table-to-text. The model leverages cross-modal data from NVBench, Chart2Text, WikiTableText, and FeVisQA to learn aligned text-DV semantics, achieving state-of-the-art performance across all four DV tasks and demonstrating robustness in cross-domain settings. This work offers a scalable path to more capable, cross-modal PLMs for data visualization and analytics, with potential impact on both research and practical DV tooling.

Abstract

Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce DataVisT5, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy, integrating text and DV datasets to effectively interpret cross-modal semantics. Extensive evaluations on public datasets show that DataVisT5 consistently outperforms current state-of-the-art models on various DV-related tasks. We anticipate that DataVisT5 will not only inspire further research on vertical PLMs but also expand the range of applications for PLMs.
Paper Structure (28 sections, 3 equations, 9 figures, 12 tables)

This paper contains 28 sections, 3 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: An illustration depicting the text-to-vis, vis-to-text, table-to-text, and free-form question-answering over data visualization problems, showcasing examples including a NL question, a DV query, a DVL visualization specification, a table description, a visualization chart, and four question-answer pairs.
  • Figure 2: The pipeline of DataVisT5.
  • Figure 3: Examples of DV Knowledge Encoding and Standardized Encoding from NVBench.
  • Figure 4: An Standardized DV Query with join operation example.
  • Figure 5: Overview of hybrid pre-training objectives. The solid lines denote the Bidirectional Dual-Corpus objectives, which facilitate the learning of language representation by leveraging bidirectional context. The dashed lines represent the T5-based MLM objectives, designed to reconstruct the original input from masked tokens.
  • ...and 4 more figures