Table of Contents
Fetching ...

PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents

Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt

TL;DR

Charts embedded in PDFs and scans often lack accessible source data and editing paths. PlotEdit combats this by coordinating five LLM agents—Chart2Table, Chart2Vision, Chart2Code, Instruction Decomposition Agent, and Multimodal Editing Agent—driven by Code, Visual, and Numeric self-reflection feedback and perceptual fidelity checks via GPT-4V, plus a re-plotter that reconstructs the final image. From an input chart figure $f$ and a user request $r$, the system de-renders components, edits the data, style, and rendering code, and outputs $\hat{f}$ that enacts the changes while preserving the original structure via fidelity constraints. Empirically, PlotEdit outperforms baselines such as ChartLLaMA and ChartReformer on the ChartCraft dataset, achieving 9–14% gains across style, layout, format, and data-centric edits, with ablations underscoring the contribution of multimodal feedback and fidelity-aware editing. The framework promises practical impact for accessible chart editing in PDFs and scanned documents, with potential integration into PDF readers to streamline workflows and improve accessibility for visually impaired users.

Abstract

Chart visualizations, while essential for data interpretation and communication, are predominantly accessible only as images in PDFs, lacking source data tables and stylistic information. To enable effective editing of charts in PDFs or digital scans, we present PlotEdit, a novel multi-agent framework for natural language-driven end-to-end chart image editing via self-reflective LLM agents. PlotEdit orchestrates five LLM agents: (1) Chart2Table for data table extraction, (2) Chart2Vision for style attribute identification, (3) Chart2Code for retrieving rendering code, (4) Instruction Decomposition Agent for parsing user requests into executable steps, and (5) Multimodal Editing Agent for implementing nuanced chart component modifications - all coordinated through multimodal feedback to maintain visual fidelity. PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits, enhancing accessibility for visually challenged users and improving novice productivity.

PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents

TL;DR

Charts embedded in PDFs and scans often lack accessible source data and editing paths. PlotEdit combats this by coordinating five LLM agents—Chart2Table, Chart2Vision, Chart2Code, Instruction Decomposition Agent, and Multimodal Editing Agent—driven by Code, Visual, and Numeric self-reflection feedback and perceptual fidelity checks via GPT-4V, plus a re-plotter that reconstructs the final image. From an input chart figure and a user request , the system de-renders components, edits the data, style, and rendering code, and outputs that enacts the changes while preserving the original structure via fidelity constraints. Empirically, PlotEdit outperforms baselines such as ChartLLaMA and ChartReformer on the ChartCraft dataset, achieving 9–14% gains across style, layout, format, and data-centric edits, with ablations underscoring the contribution of multimodal feedback and fidelity-aware editing. The framework promises practical impact for accessible chart editing in PDFs and scanned documents, with potential integration into PDF readers to streamline workflows and improve accessibility for visually impaired users.

Abstract

Chart visualizations, while essential for data interpretation and communication, are predominantly accessible only as images in PDFs, lacking source data tables and stylistic information. To enable effective editing of charts in PDFs or digital scans, we present PlotEdit, a novel multi-agent framework for natural language-driven end-to-end chart image editing via self-reflective LLM agents. PlotEdit orchestrates five LLM agents: (1) Chart2Table for data table extraction, (2) Chart2Vision for style attribute identification, (3) Chart2Code for retrieving rendering code, (4) Instruction Decomposition Agent for parsing user requests into executable steps, and (5) Multimodal Editing Agent for implementing nuanced chart component modifications - all coordinated through multimodal feedback to maintain visual fidelity. PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits, enhancing accessibility for visually challenged users and improving novice productivity.
Paper Structure (3 sections, 1 figure, 1 table)

This paper contains 3 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: PlotEdit accurately edits chart images as per user requests by orchestrating LLM agents: (1) Chart2Table for data table extraction, (2) Chart2Vision for style attribute identification, (3) Chart2Code for retrieving rendering code, (4) Instruction Decomposition Agent for parsing user requests into executable steps, and (5) Multimodal Editing Agent for implementing nuanced chart component modifications coordinated through multimodal feedback to maintain visual fidelity.