LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Victor Dibia
TL;DR
LIDA tackles automatic visualization generation by orchestrating four modules—summarizer, goal explorer, visGenerator, and infographer—to convert data into grounding language, propose goals, produce executable visualization code across multiple grammars, and render data-faithful, stylized infographics. It leverages LLMS for goal reasoning and code construction and IGMs for styling, all within a grammar-agnostic pipeline, and introduces ver and sevq as reliability and quality metrics. The authors demonstrate a low ver (~3.5%) across 57 Vega-derived datasets, with ablation showing benefits from enriched data summaries and multi-grammar support. The work provides an open-source Python API and UI, positioning LIDA as a modular building block for automated data exploration, storytelling, accessibility, and chart QA.
Abstract
Systems that support users in the automatic creation of visualizations must address several subtasks - understand the semantics of data, enumerate relevant visualization goals and generate visualization specifications. In this work, we pose visualization generation as a multi-stage generation problem and argue that well-orchestrated pipelines based on large language models (LLMs) such as ChatGPT/GPT-4 and image generation models (IGMs) are suitable to addressing these tasks. We present LIDA, a novel tool for generating grammar-agnostic visualizations and infographics. LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines, executes and filters visualization code and an INFOGRAPHER module that yields data-faithful stylized graphics using IGMs. LIDA provides a python api, and a hybrid user interface (direct manipulation and multilingual natural language) for interactive chart, infographics and data story generation. Learn more about the project here - https://microsoft.github.io/lida/
