Data Extraction from Charts via Single Deep Neural Network
Xiaoyi Liu, Diego Klabjan, Patrick NBless
TL;DR
The paper tackles automatic data extraction from bar and pie charts by converting charts into relational data tables using a unified deep neural network. It presents a chart-type classifier followed by specialized bar- and pie-chart extraction models that integrate object detection, text recognition, and inter-component matching, including pie slice angle prediction. Key contributions include a unified detection-recognition-matching pipeline, an angle-based pie chart extension, and multi-task loss formulations. Results show strong performance on simulated data and reveal gaps when generalizing to Internet-sourced charts, highlighting the need for richer training data to handle variability and small objects.
Abstract
Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively.
