Table of Contents
Fetching ...

ChartLens: Fine-grained Visual Attribution in Charts

Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha

TL;DR

This work tackles the problem of hallucinations in chart-focused multimodal language models by introducing Post-Hoc Visual Attribution for Charts and ChartLens, a segmentation- and set-of-marks prompting-based grounding method. It formalizes the attribution objective with a mapping $f:(c,v)\mapsto \mathcal{A}_{c,v}$ and evaluation criteria of relevance, completeness, and precision, then presents ChartVA-Eval, a benchmark with 1200+ samples drawn from synthetic and real-world sources for fine-grained attribution assessment. ChartLens combines heuristic and SAM-based segmentation to produce robust visual marks, which are then used to ground model responses via SoM prompting and chain-of-thought validation, achieving 26-66% improvements over baselines on attribution accuracy. The approach is validated across bar, line, and pie charts, using real datasets such as MATSA, PlotQA, and ChartQA, and demonstrates practical significance for reliable chart interpretation in domains like finance and policy. The work sets a foundation for transparent, verifiable chart reasoning in critical applications and points to future integration with textual elements and broader visual data forms.

Abstract

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

ChartLens: Fine-grained Visual Attribution in Charts

TL;DR

This work tackles the problem of hallucinations in chart-focused multimodal language models by introducing Post-Hoc Visual Attribution for Charts and ChartLens, a segmentation- and set-of-marks prompting-based grounding method. It formalizes the attribution objective with a mapping and evaluation criteria of relevance, completeness, and precision, then presents ChartVA-Eval, a benchmark with 1200+ samples drawn from synthetic and real-world sources for fine-grained attribution assessment. ChartLens combines heuristic and SAM-based segmentation to produce robust visual marks, which are then used to ground model responses via SoM prompting and chain-of-thought validation, achieving 26-66% improvements over baselines on attribution accuracy. The approach is validated across bar, line, and pie charts, using real datasets such as MATSA, PlotQA, and ChartQA, and demonstrates practical significance for reliable chart interpretation in domains like finance and policy. The work sets a foundation for transparent, verifiable chart reasoning in critical applications and points to future integration with textual elements and broader visual data forms.

Abstract

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.

Paper Structure

This paper contains 27 sections, 2 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: We introduce the task of visual attribution for charts (➊), which grounds textual responses to specific regions in the chart image. This promotes reliable understanding by enabling users to verify claims (➋), thus detect potentially hallucinated responses and identifying chart-response misalignments.
  • Figure 2: ChartLens: ➊ Chart elements, such as bars and pie sectors, are extracted through heuristic-guided methods and refined using SAM, while lines are segmented using Lineformer. ➋ The segmented elements are then marked, labeled, and used to prompt multimodal LLMs, enabling fine-grained attribution by grounding textual responses to visual regions.
  • Figure 3: Qualitative comparison of our ChartLens with the baselines. ChartLens is able to effectively localize relevant, complete and precise attributions in the chart images.
  • Figure 4: Overview of annotation guidelines provided to annotators for ensuring accurate and consistent visual attributions.
  • Figure 5: The design decision option space for MATSA synthetic charts, illustrating the various configurable elements and parameters available for customizing chart generation. This visual representation highlights the flexibility in chart design, encompassing aspects such as chart type, data presentation styles, and visual encoding options.