Table of Contents
Fetching ...

ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution

Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt

TL;DR

This paper tackles hallucination in chart question answering by grounding AI-generated responses in chart visuals through precise, bounding-box citations. ChartCitor introduces a multi-agent architecture that decomposes chart understanding into chart-to-table extraction, answer reformulation, row/column/cell captioning, evidence pre-filtering and re-ranking, and cell-to-chart localization. Evaluation uses a visual IoU-based grounding metric with a threshold of $\ge 0.9$ and shows ChartCitor outperforming baselines by about 9–15 percentage points across chart types, along with a user study indicating improved trust due to transparent citations. Overall, the approach enhances explainability and productivity in chart QA over PDFs, paving the way for more trustworthy multimodal document QA.

Abstract

Large Language Models (LLMs) can perform chart question-answering tasks but often generate unverified hallucinated responses. Existing answer attribution methods struggle to ground responses in source charts due to limited visual-semantic context, complex visual-text alignment requirements, and difficulties in bounding box prediction across complex layouts. We present ChartCitor, a multi-agent framework that provides fine-grained bounding box citations by identifying supporting evidence within chart images. The system orchestrates LLM agents to perform chart-to-table extraction, answer reformulation, table augmentation, evidence retrieval through pre-filtering and re-ranking, and table-to-chart mapping. ChartCitor outperforms existing baselines across different chart types. Qualitative user studies show that ChartCitor helps increase user trust in Generative AI by providing enhanced explainability for LLM-assisted chart QA and enables professionals to be more productive.

ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution

TL;DR

This paper tackles hallucination in chart question answering by grounding AI-generated responses in chart visuals through precise, bounding-box citations. ChartCitor introduces a multi-agent architecture that decomposes chart understanding into chart-to-table extraction, answer reformulation, row/column/cell captioning, evidence pre-filtering and re-ranking, and cell-to-chart localization. Evaluation uses a visual IoU-based grounding metric with a threshold of and shows ChartCitor outperforming baselines by about 9–15 percentage points across chart types, along with a user study indicating improved trust due to transparent citations. Overall, the approach enhances explainability and productivity in chart QA over PDFs, paving the way for more trustworthy multimodal document QA.

Abstract

Large Language Models (LLMs) can perform chart question-answering tasks but often generate unverified hallucinated responses. Existing answer attribution methods struggle to ground responses in source charts due to limited visual-semantic context, complex visual-text alignment requirements, and difficulties in bounding box prediction across complex layouts. We present ChartCitor, a multi-agent framework that provides fine-grained bounding box citations by identifying supporting evidence within chart images. The system orchestrates LLM agents to perform chart-to-table extraction, answer reformulation, table augmentation, evidence retrieval through pre-filtering and re-ranking, and table-to-chart mapping. ChartCitor outperforms existing baselines across different chart types. Qualitative user studies show that ChartCitor helps increase user trust in Generative AI by providing enhanced explainability for LLM-assisted chart QA and enables professionals to be more productive.

Paper Structure

This paper contains 5 sections, 2 figures.

Figures (2)

  • Figure 1: ChartCitor - a multi-agent framework that performs table extraction, answer reformulation, entity captioning, row/col retrieval, and cell localization in chart images to ground answers.
  • Figure 2: (a) Ablation analysis of multimodal feedback agents; (b) User Evaluation of ChartCitor