Table of Contents
Fetching ...

V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions

Luca Podo, Marco Angelini, Paola Velardi

TL;DR

V-RECS addresses NL2VIS challenges by delivering a low-cost, explainable visualization recommender that outputs not only a visualization but also explanations, captions, and exploration suggestions. It employs a teacher-student Chain-of-Thought framework, using GPT-4 to generate narrative-rich training data and fine-tunes a small Llama-2-7B model via QLoRA to emulate this behavior, enabling scalable deployment. Evaluations on NvBench with the EvaLLM framework show V-RECS achieving comparable performance to GPT-4 on most metrics while delivering superior axis correctness and richer explanations, albeit with some captions lagging behind. The work provides a practical, open-source pipeline for narrative-enabled visualization generation and highlights pathways for integrating LLM4VIS into real-world visualization and visual analytics workflows.

Abstract

NL2VIS (natural language to visualization) is a promising and recent research area that involves interpreting natural language queries and translating them into visualizations that accurately represent the underlying data. As we navigate the era of big data, NL2VIS holds considerable application potential since it greatly facilitates data exploration by non-expert users. Following the increasingly widespread usage of generative AI in NL2VIS applications, in this paper we present V-RECS, the first LLM-based Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration. V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users. Furthermore, our proposed solution mitigates computational, controllability, and cost issues associated with using powerful LLMs by leveraging a methodology to effectively fine-tune small models. To generate insightful visualization narratives, we use Chain-of-Thoughts (CoT), a prompt engineering technique to help LLM identify and generate the logical steps to produce a correct answer. Since CoT is reported to perform poorly with small LLMs, we adopted a strategy in which a large LLM (GPT-4), acting as a Teacher, generates CoT-based instructions to fine-tune a small model, Llama-2-7B, which plays the role of a Student. Extensive experiments-based on a framework for the quantitative evaluation of AI-based visualizations and on manual assessment by a group of participants-show that V-RECS achieves performance scores comparable to GPT-4, at a much lower cost. The efficacy of the V-RECS teacher-student paradigm is also demonstrated by the fact that the un-tuned Llama fails to perform the task in the vast majority of test cases. We release V-RECS for the visualization community to assist visualization designers throughout the entire visualization generation process.

V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions

TL;DR

V-RECS addresses NL2VIS challenges by delivering a low-cost, explainable visualization recommender that outputs not only a visualization but also explanations, captions, and exploration suggestions. It employs a teacher-student Chain-of-Thought framework, using GPT-4 to generate narrative-rich training data and fine-tunes a small Llama-2-7B model via QLoRA to emulate this behavior, enabling scalable deployment. Evaluations on NvBench with the EvaLLM framework show V-RECS achieving comparable performance to GPT-4 on most metrics while delivering superior axis correctness and richer explanations, albeit with some captions lagging behind. The work provides a practical, open-source pipeline for narrative-enabled visualization generation and highlights pathways for integrating LLM4VIS into real-world visualization and visual analytics workflows.

Abstract

NL2VIS (natural language to visualization) is a promising and recent research area that involves interpreting natural language queries and translating them into visualizations that accurately represent the underlying data. As we navigate the era of big data, NL2VIS holds considerable application potential since it greatly facilitates data exploration by non-expert users. Following the increasingly widespread usage of generative AI in NL2VIS applications, in this paper we present V-RECS, the first LLM-based Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration. V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users. Furthermore, our proposed solution mitigates computational, controllability, and cost issues associated with using powerful LLMs by leveraging a methodology to effectively fine-tune small models. To generate insightful visualization narratives, we use Chain-of-Thoughts (CoT), a prompt engineering technique to help LLM identify and generate the logical steps to produce a correct answer. Since CoT is reported to perform poorly with small LLMs, we adopted a strategy in which a large LLM (GPT-4), acting as a Teacher, generates CoT-based instructions to fine-tune a small model, Llama-2-7B, which plays the role of a Student. Extensive experiments-based on a framework for the quantitative evaluation of AI-based visualizations and on manual assessment by a group of participants-show that V-RECS achieves performance scores comparable to GPT-4, at a much lower cost. The efficacy of the V-RECS teacher-student paradigm is also demonstrated by the fact that the un-tuned Llama fails to perform the task in the vast majority of test cases. We release V-RECS for the visualization community to assist visualization designers throughout the entire visualization generation process.
Paper Structure (19 sections, 2 equations, 9 figures, 4 tables)

This paper contains 19 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: V-RECS model task workflow. The three green boxes represent V-RECS' specific tasks, while the others are common to all LLM4VIS models.
  • Figure 2: During the training phase (left), a large LLM, GPT-4, acting as a Teacher (A), receives as input triples (D,Q,V) extracted from an available dataset, where D is tabular data, Q is a natural language query, and V is the VegaZero specification for a chart matching the user's query Q. The model is prompted to perform three Chain of Thought (CoT) tasks: T1, to explain the reasoning steps that justify the answer V given the query Q; T2, to generate a caption that describes the chart, and T3, to suggest additional useful queries. The responses of the model are combined to create a visualization narrative (VN) used to augment the initial dataset (C). Next, a small model, Llama-2-7B (the Student), is fine-tuned (D) with the enriched dataset of quadruples (D,Q,V,VN). At inference time (right), the resulting specialized model, named V-RECS, receives as input a pair (D,Q) and generates, along with a visualization recommendation V, a visualization narrative VN made of an explanation E, a caption C, and a suggestion S of additional queries. Adopting a teacher-student metaphor, the esteemed professor GPT-4, leveraging its trillion-parameters knowledge of data visualization and natural language understanding, generates examples of visualization narratives for the young student Llama-2-7B. Llama uses this supplementary material to diligently learn the task of producing similar narratives alongside the recommended visualization, and finally gets its doctorate in Visual Recommendation with Narratives, becoming Dr. V-RECS. During the exercise of its profession, will Dr. V-RECS be able to match Professor GPT-4's abilities? Certainly, its professional fees are much cheaper.
  • Figure 3: Structure of the T1 prompt. The related task is described in Table \ref{['table:teachers']}. It has three parts: A encourages the LLM to reason "step-by-step" as per the Chain of Thoughts method, B includes the description of sub-tasks (steps) to guide the model, and C is the response template. Note, in box B, that Step 2 and Step 3 direct the system to split the explanation into two parts. Part 1 (E1) summarizes the user's information needs, to make sure they have been correctly interpreted; part 2 (E2) explains why, given D and Q, certain features X and Y and transformation functions have been selected.
  • Figure 4: Training template during fine-tuning. The leftmost box shows how the Teacher's responses R1, R2, and R3 contribute to filling the different parts of a Student's training instance. The rightmost box is a (partly) filled template for a specific training instance.
  • Figure 5: Example of V-RECS response at inference time for the query Which product lines generate the most revenue?
  • ...and 4 more figures