Table of Contents
Fetching ...

ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM

Songheng Zhang, Lei Wang, Toby Jia-Jun Li, Qiaomu Shen, Yixin Cao, Yong Wang

TL;DR

ChartifyText is proposed, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts to accurately convey the underlying data and insights to readers.

Abstract

Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, which is essentially a challenging task. The challenges originate from text ambiguities, intrinsic sparsity and uncertainty of data in text documents, and subjective sentiment differences. Specifically, we propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts. It consists of two major modules: tabular data inference and expressive chart generation. The tabular data inference module employs systematic prompt engineering to guide the LLM (e.g., GPT-4) to infer table data, where data ranges, uncertainties, missing data values and corresponding subjective sentiments are explicitly considered. The expressive chart generation module augments standard charts with intuitive visual encodings and concise texts to accurately convey the underlying data and insights. We extensively evaluate the effectiveness of ChartifyText on real-world data-involved text documents through case studies, in-depth interviews with three visualization experts, and a carefully-designed user study with 15 participants. The results demonstrate the usefulness and effectiveness of ChartifyText in helping readers efficiently and effectively make sense of data-involved texts.

ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM

TL;DR

ChartifyText is proposed, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts to accurately convey the underlying data and insights to readers.

Abstract

Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, which is essentially a challenging task. The challenges originate from text ambiguities, intrinsic sparsity and uncertainty of data in text documents, and subjective sentiment differences. Specifically, we propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts. It consists of two major modules: tabular data inference and expressive chart generation. The tabular data inference module employs systematic prompt engineering to guide the LLM (e.g., GPT-4) to infer table data, where data ranges, uncertainties, missing data values and corresponding subjective sentiments are explicitly considered. The expressive chart generation module augments standard charts with intuitive visual encodings and concise texts to accurately convey the underlying data and insights. We extensively evaluate the effectiveness of ChartifyText on real-world data-involved text documents through case studies, in-depth interviews with three visualization experts, and a carefully-designed user study with 15 participants. The results demonstrate the usefulness and effectiveness of ChartifyText in helping readers efficiently and effectively make sense of data-involved texts.

Paper Structure

This paper contains 29 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The overview of transforming text into a chart. The process begins with (A) Inputs which contains: A1 the text statement users select and A2 the context of the text statement. (B) The Tabular Data Inference transforms the text statement into a data table in 4 steps. (C) The resulting data table. (D) The appropriate chart type is recommended based on the characteristics of the tabular, and the chart is augmented with special visual encodings to accurately present the underlying data. (E) A generated chart represents the selected text statement.
  • Figure 2: Value Inference in Data Table: (A) In A1, yellow highlights indicate the header row, while purple highlights mark row identifiers, both generated during Table Schema Creation. Cells with downplayed values indicate non-convertible data; empty cells signify missing values not directly found in the text statement. A2 denotes the inputs: text statement and its context. (B) After the Data Inference, the table is completed, with inferred values assigned uncertainty scores to reflect confidence levels.
  • Figure 3: Special Encoding Designs. (A) Uncertainty Encoding represents the degree of uncertainty of data value. Longer stripe means larger uncertainty (B) Data Range Encoding represents data ranges in two conditions. Data value is within a specific range where its maximum and minimum values are determined; data value is either smaller or larger than a specific Fig.. (C) represents values that may not be in the text. (D) Sentiment encoding uses text annotation to describe the topic and uses background colors to represent positive, negative, and neutral sentiments, respectively.
  • Figure 4: User study evaluation result. (A) The average time to answer a question. (B) The average score in the answering. (C) The NASA Task Load Index.
  • Figure 5: Participants' ratings for the four visual encodings on 7-point scale.