Table of Contents
Fetching ...

VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction

Joshua Gorniak, Yoon Kim, Donglai Wei, Nam Wook Kim

TL;DR

VizAbility addresses the data-visualization accessibility gap for blind and low-vision users by marrying keyboard-navigable chart encodings with an LLM-based conversational QA system. It leverages Olli's text-based chart tree, Vega-Lite data, and context-aware prompting to support visual, analytical, contextual, and navigation queries, with proactive and reactive mechanisms to mitigate LLM failures. The approach is evaluated through a Q&A benchmark (87.39% classification accuracy; strong human/LLM agreement with Kendall's $\tau$=$0.5526$, $p<0.001$) and a qualitative user study with six BLV participants, showing meaningful usability gains and specific improvement opportunities. Findings indicate VizAbility outperforms baselines (including GPT-4V), enhances transparency about sources and reasoning, and demonstrates potential for integration into existing visualization workflows, while identifying directions for richer benchmarks and vision integration to broaden applicability.

Abstract

Traditional accessibility methods like alternative text and data tables typically underrepresent data visualization's full potential. Keyboard-based chart navigation has emerged as a potential solution, yet efficient data exploration remains challenging. We present VizAbility, a novel system that enriches chart content navigation with conversational interaction, enabling users to use natural language for querying visual data trends. VizAbility adapts to the user's navigation context for improved response accuracy and facilitates verbal command-based chart navigation. Furthermore, it can address queries for contextual information, designed to address the needs of visually impaired users. We designed a large language model (LLM)-based pipeline to address these user queries, leveraging chart data & encoding, user context, and external web knowledge. We conducted both qualitative and quantitative studies to evaluate VizAbility's multimodal approach. We discuss further opportunities based on the results, including improved benchmark testing, incorporation of vision models, and integration with visualization workflows.

VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction

TL;DR

VizAbility addresses the data-visualization accessibility gap for blind and low-vision users by marrying keyboard-navigable chart encodings with an LLM-based conversational QA system. It leverages Olli's text-based chart tree, Vega-Lite data, and context-aware prompting to support visual, analytical, contextual, and navigation queries, with proactive and reactive mechanisms to mitigate LLM failures. The approach is evaluated through a Q&A benchmark (87.39% classification accuracy; strong human/LLM agreement with Kendall's =, ) and a qualitative user study with six BLV participants, showing meaningful usability gains and specific improvement opportunities. Findings indicate VizAbility outperforms baselines (including GPT-4V), enhances transparency about sources and reasoning, and demonstrates potential for integration into existing visualization workflows, while identifying directions for richer benchmarks and vision integration to broaden applicability.

Abstract

Traditional accessibility methods like alternative text and data tables typically underrepresent data visualization's full potential. Keyboard-based chart navigation has emerged as a potential solution, yet efficient data exploration remains challenging. We present VizAbility, a novel system that enriches chart content navigation with conversational interaction, enabling users to use natural language for querying visual data trends. VizAbility adapts to the user's navigation context for improved response accuracy and facilitates verbal command-based chart navigation. Furthermore, it can address queries for contextual information, designed to address the needs of visually impaired users. We designed a large language model (LLM)-based pipeline to address these user queries, leveraging chart data & encoding, user context, and external web knowledge. We conducted both qualitative and quantitative studies to evaluate VizAbility's multimodal approach. We discuss further opportunities based on the results, including improved benchmark testing, incorporation of vision models, and integration with visualization workflows.
Paper Structure (73 sections, 7 figures, 5 tables)

This paper contains 73 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An example of a user's keyboard traversal of the Olli Tree. Users can widen/narrow the scope of the text via the up/down arrow keys (respectively), and otherwise navigate between sibling nodes using left/right arrow keys. To access individual data, users can press the 't' key to view a snapshot data table
  • Figure 2: VizAbility pipeline takes a user query and refines it to improve clarity. The query is classified into one of four query types, each of which is fed to a different agent. If VizAbility fails to respond, it attempts to suggest alternative queries.
  • Figure 3: User questions are initially categorized based on query type via an LLM trained with few-shot prompting. We populate the prompt with sample questions and their corresponding ground truth classifications, which we extract from the validation set. Only those validation questions that share the highest cosine similarity score with the user query are selected within each query type.
  • Figure 4: Query-specific evaluation for Analytical and Visual queries. We parse the chart's transformed data set and aggregate color encoding within a CSV file, which we then supply to an LLM via a CSV agent. For further context, we also populate the prompt with the user's active position within the Olli Tree, in addition to a text representation of the Tree itself.
  • Figure 5: Query-specific evaluation for Navigation queries. We pass a text representation of the Olli Tree and the addresses of corresponding nodes within the Tree to an LLM alongside the user question. With the aid of few-shot prompting, the LLM then identifies the starting and ending nodes within the Olli Tree. Should the starting node not be explicitly mentioned within the question, the model instead utilizes the user's current location within the Tree. We then execute a breadth-search algorithm and relay the shortest path between starting and ending nodes back to the user.
  • ...and 2 more figures