Table of Contents
Fetching ...

Can LLMs Generate Visualizations with Dataless Prompts?

Darius Coelho, Harshit Barot, Naitik Rathod, Klaus Mueller

TL;DR

The paper addresses whether large language models can generate accurate data visualizations from dataless prompts, i.e., queries without accompanying data. It evaluates GPT-3 and GPT-4 (and DALL-E in early tests) on 15 dataless prompts, comparing the generated visuals to the visualization cheat sheet by Patrik Lundblad (Qlik) and to Google Images for ground-truth reference. Results show GPT-4 can produce coherent, data-informed visuals that align with best-practice guidelines, though exact numerical values may differ from ground-truth sources, while GPT-3.5 is less reliable. The findings suggest LLMs encode substantial data-visualization knowledge and can generate useful visuals directly from prompts, with future work exploring automated data retrieval and infographic generation with DALL-E.

Abstract

Recent advancements in large language models have revolutionized information access, as these models harness data available on the web to address complex queries, becoming the preferred information source for many users. In certain cases, queries are about publicly available data, which can be effectively answered with data visualizations. In this paper, we investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries. Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query. We evaluate the results of the models by comparing them to visualization cheat sheets created by visualization experts.

Can LLMs Generate Visualizations with Dataless Prompts?

TL;DR

The paper addresses whether large language models can generate accurate data visualizations from dataless prompts, i.e., queries without accompanying data. It evaluates GPT-3 and GPT-4 (and DALL-E in early tests) on 15 dataless prompts, comparing the generated visuals to the visualization cheat sheet by Patrik Lundblad (Qlik) and to Google Images for ground-truth reference. Results show GPT-4 can produce coherent, data-informed visuals that align with best-practice guidelines, though exact numerical values may differ from ground-truth sources, while GPT-3.5 is less reliable. The findings suggest LLMs encode substantial data-visualization knowledge and can generate useful visuals directly from prompts, with future work exploring automated data retrieval and infographic generation with DALL-E.

Abstract

Recent advancements in large language models have revolutionized information access, as these models harness data available on the web to address complex queries, becoming the preferred information source for many users. In certain cases, queries are about publicly available data, which can be effectively answered with data visualizations. In this paper, we investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries. Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query. We evaluate the results of the models by comparing them to visualization cheat sheets created by visualization experts.
Paper Structure (9 sections, 3 figures)

This paper contains 9 sections, 3 figures.

Figures (3)

  • Figure 1: Visualizations showing the U.S. debt over the last two decades. (a) is the ground truth visualization retrieved from Statista while the remaining visualizations are generated with the prompt "Generate a chart showing the national debt of the U.S. over the last 2 decades" with (b) DALL-E, (c) GPT-3.5, and (d) GPT-4.
  • Figure 2: The visualization cheatsheet created by Patrik Lundblad at Qlik to assist visualization designers pick appropriate charts for their data
  • Figure 3: Visualizations retrieved from (a) Google images and (b) GPT-4 showing the U.S. cities with the highest average rent. When compared to the visualization from Google images, we see that GPT-4 was able to generate a similar ordering of cities however it provided different values for mean rents, perhaps it retrieved data from a different year.