Table of Contents
Fetching ...

Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots

Maciej P. Polak, Dane Morgan

TL;DR

This work addresses the manual bottleneck of extracting quantitative data from plots by introducing PlotExtract, a zero-shot workflow that leverages vision-capable LLMs to digitize data from two-axis plots. The method sequentially extracts data, generates and runs replotting code, and verifies accuracy via a visual comparison, without any model fine-tuning. Across synthetic and published datasets, it achieves average data-extraction errors in the low single-digit percent range, with precision effectively at or near $100\%$ and recall around $85$–$90\%$, demonstrating high-throughput viability. The approach promises automated, scalable data extraction from plots, with expected improvements as multimodal LLMs continue to advance.

Abstract

Automated data extraction from research texts has been steadily improving, with the emergence of large language models (LLMs) accelerating progress even further. Extracting data from plots in research papers, however, has been such a complex task that it has predominantly been confined to manual data extraction. We show that current multimodal large language models, with proper instructions and engineered workflows, are capable of accurately extracting data from plots. This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts we call PlotExtract, without the need to fine-tune. We demonstrate PlotExtract here and assess its performance on synthetic and published plots. We consider only plots with two axes in this analysis. For plots identified as extractable, PlotExtract finds points with over 90% precision (and around 90% recall) and errors in x and y position of around 5% or lower. These results prove that multimodal LLMs are a viable path for high-throughput data extraction for plots and in many circumstances can replace the current manual methods of data extraction.

Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots

TL;DR

This work addresses the manual bottleneck of extracting quantitative data from plots by introducing PlotExtract, a zero-shot workflow that leverages vision-capable LLMs to digitize data from two-axis plots. The method sequentially extracts data, generates and runs replotting code, and verifies accuracy via a visual comparison, without any model fine-tuning. Across synthetic and published datasets, it achieves average data-extraction errors in the low single-digit percent range, with precision effectively at or near and recall around , demonstrating high-throughput viability. The approach promises automated, scalable data extraction from plots, with expected improvements as multimodal LLMs continue to advance.

Abstract

Automated data extraction from research texts has been steadily improving, with the emergence of large language models (LLMs) accelerating progress even further. Extracting data from plots in research papers, however, has been such a complex task that it has predominantly been confined to manual data extraction. We show that current multimodal large language models, with proper instructions and engineered workflows, are capable of accurately extracting data from plots. This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts we call PlotExtract, without the need to fine-tune. We demonstrate PlotExtract here and assess its performance on synthetic and published plots. We consider only plots with two axes in this analysis. For plots identified as extractable, PlotExtract finds points with over 90% precision (and around 90% recall) and errors in x and y position of around 5% or lower. These results prove that multimodal LLMs are a viable path for high-throughput data extraction for plots and in many circumstances can replace the current manual methods of data extraction.

Paper Structure

This paper contains 9 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: An example of plot data extraction. The source plot is Fig. 19 in example_fig19
  • Figure 2: A simplified schematic of the vision LLM-based plot data extraction workflow.
  • Figure 3: A schematic depiction of the two evaluation methods. Panel (a) shows the original and extracted datapoints on a single plot, connected with dashed lines to guide the eye. Panel (b) shows a pointwise comparison, and panel (c) shows an interpolation comparison.