Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

Giuseppe Carenini; Jordon Johnson; Ali Salamatian

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

Giuseppe Carenini, Jordon Johnson, Ali Salamatian

TL;DR

This tutorial addresses how to caption visualizations using state-of-the-art language models and LVLMs by linking InfoVis fundamentals (abstractions, marks, channels) with transformer-based NLP techniques. It outlines LLM limitations (e.g., arithmetic reasoning, planning, hallucinations) and mitigation approaches (CoT, RAG, RLHF), while highlighting LVLM progress for visualization captioning. The survey of key papers and datasets (e.g., ChartToText, VisText) illustrates advances in dataset creation, modeling, and evaluation, and points to open challenges such as domain specificity, complex visualizations, and multilingual captioning. Overall, the work guides researchers and practitioners in developing robust, accessible, and evaluated captioning systems for visualizations using cutting-edge LLM and LVLM technology.

Abstract

Automatically captioning visualizations is not new, but recent advances in large language models(LLMs) open exciting new possibilities. In this tutorial, after providing a brief review of Information Visualization (InfoVis) principles and past work in captioning, we introduce neural models and the transformer architecture used in generic LLMs. We then discuss their recent applications in InfoVis, with a focus on captioning. Additionally, we explore promising future directions in this field.

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

TL;DR

Abstract

Paper Structure (11 sections, 4 figures)

This paper contains 11 sections, 4 figures.

Introduction
Past Editions, Similar Initiatives and Target Audience
Organization and Duration
Part 1
Key InfoVis Concepts: Abstractions, Marks, Channels
Captioning visualizations
Neural Networks and the Transformer architecture
Part 2
Large Language Models: Limitations and Recent Development minaee2024large
Recent Advances and Challenges in InfoVis Captioning: A Review of Key Papers
Presenters' Past Experiences

Figures (4)

Figure 1: "A four-level model of semantic content for accessible visualization. Levels are defined by the semantic content conveyed by natural language descriptions of visualizations." 2022-vis-text-model
Figure 2: "The Y-axis identifies the houses in the three charts. In the left chart, house prices are shown along the X-axis. The house’s selling price is shown by the left edge of the bar, whereas the house’s asking price is shown by the right edge of the bar..." mittal-etal-1998-describing.
Figure 3: "The scene-graph model’s output L1 caption and L2/L3 caption for a VisText bar chart..." tang2023vistext
Figure 4: "Error distribution for different models on VisText and Pew." huang2024lvlms

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

TL;DR

Abstract

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

Authors

TL;DR

Abstract

Table of Contents

Figures (4)