Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization
Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu
TL;DR
The study tackles the challenge of semantically profiling natural-language utterances for data visualization by evaluating four public LLMs on a 500-utterance corpus. It systematically measures three facets: uncertainty handling, relevant data context (attributes and transformations), and visualization-task inference. Findings reveal that while LLMs reliably extract data context and transformations in many cases, they are highly sensitive to utterance uncertainty and struggle to correctly infer visualization tasks, with substantial disagreement relative to human annotations. These insights point to targeted research directions for integrating LLMs into NLIs for visualization, emphasizing improved prompt design, robust code generation, and interactive clarification to better align with human intent.
Abstract
Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our findings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation.
