Table of Contents
Fetching ...

Data Extraction from Charts via Single Deep Neural Network

Xiaoyi Liu, Diego Klabjan, Patrick NBless

TL;DR

The paper tackles automatic data extraction from bar and pie charts by converting charts into relational data tables using a unified deep neural network. It presents a chart-type classifier followed by specialized bar- and pie-chart extraction models that integrate object detection, text recognition, and inter-component matching, including pie slice angle prediction. Key contributions include a unified detection-recognition-matching pipeline, an angle-based pie chart extension, and multi-task loss formulations. Results show strong performance on simulated data and reveal gaps when generalizing to Internet-sourced charts, highlighting the need for richer training data to handle variability and small objects.

Abstract

Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively.

Data Extraction from Charts via Single Deep Neural Network

TL;DR

The paper tackles automatic data extraction from bar and pie charts by converting charts into relational data tables using a unified deep neural network. It presents a chart-type classifier followed by specialized bar- and pie-chart extraction models that integrate object detection, text recognition, and inter-component matching, including pie slice angle prediction. Key contributions include a unified detection-recognition-matching pipeline, an angle-based pie chart extension, and multi-task loss formulations. Results show strong performance on simulated data and reveal gaps when generalizing to Internet-sourced charts, highlighting the need for richer training data to handle variability and small objects.

Abstract

Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively.

Paper Structure

This paper contains 21 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Framework for charts data extraction
  • Figure 2: Angle prediction branch for pie chart extraction
  • Figure 3: Pie chart object-matching model
  • Figure 4: Bar chart samples with "good," "OK" and "bad" predictions; dash circles indicate problematic regions
  • Figure 5: Pie chart samples with "good" and "bad" predictions; dash circles indicate challenging regions