ChartEye: A Deep Learning Framework for Chart Information Extraction
Osama Mustafa, Muhammad Khizer Ali, Momina Moetesum, Imran Siddiqi
TL;DR
This paper tackles automatic extraction of explicit information from chart images, a problem intensified by diverse chart layouts. It introduces ChartEye, a multi-task framework that combines Swin Transformer-based chart-type and text-role classification with YOLOv7 for text detection and ESRGAN-based text upscaling, followed by TPS-ResNet-BiLSTM-Attn text recognition. On the ICPR2022 CHARTINFO dataset, the approach achieves 0.97 F1 for chart-type classification, 0.91 F1 for text-role classification, and 0.95 mAP for text detection, demonstrating strong performance across chart types. A key contribution is leveraging hierarchical transformers to capture positional and relational cues, enabling a generic pipeline applicable to multiple chart types. The ESRGAN-based text enhancement further strengthens recognition, supporting reliable extraction of structured data from visual charts for downstream analytics.
Abstract
The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked process due to style variations and, as a consequence, it is challenging to design an end-to-end system. In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline. The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection. The detected text is then enhanced using Super Resolution Generative Adversarial Networks to improve the recognition output of the OCR. Experimental results on a benchmark dataset show that our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
