Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models
Subham Sah, Rishab Mitra, Arpit Narechania, Alex Endert, John Stasko, Wenwen Dou
TL;DR
This paper tackles the explainability gap in NL2VIS by designing NL4DV-LLM, a comprehensive prompt that converts NL queries and a tabular dataset into a transparent analytic specification (attributes, analytic tasks, and visualizations) in JSON form. It embeds a formal analytic-task taxonomy, supports conversational follow-ups, and uses a data-subset strategy to respect token limits, with the output designed for easy debugging and integration into NL4DV. In a preliminary GPT-4 evaluation on 740 queries across three domains, NL4DV-LLM achieved 87.02% accuracy compared with 64.05% for NL4DV, albeit with longer average response time (~25 seconds). The work demonstrates promising explainability and debuggability gains for NL2VIS, while acknowledging limitations such as potential JSON/encoding errors and the need for broader cross-LLM validation and dataset coverage; the prompt is open-source at nl4dv.github.io.
Abstract
Recently, large language models (LLMs) have shown great promise in translating natural language (NL) queries into visualizations, but their "black-box" nature often limits explainability and debuggability. In response, we present a comprehensive text prompt that, given a tabular dataset and an NL query about the dataset, generates an analytic specification including (detected) data attributes, (inferred) analytic tasks, and (recommended) visualizations. This specification captures key aspects of the query translation process, affording both explainability and debuggability. For instance, it provides mappings from the detected entities to the corresponding phrases in the input query, as well as the specific visual design principles that determined the visualization recommendations. Moreover, unlike prior LLM-based approaches, our prompt supports conversational interaction and ambiguity detection capabilities. In this paper, we detail the iterative process of curating our prompt, present a preliminary performance evaluation using GPT-4, and discuss the strengths and limitations of LLMs at various stages of query translation. The prompt is open-source and integrated into NL4DV, a popular Python-based natural language toolkit for visualization, which can be accessed at https://nl4dv.github.io.
