Table of Contents
Fetching ...

ChartEye: A Deep Learning Framework for Chart Information Extraction

Osama Mustafa, Muhammad Khizer Ali, Momina Moetesum, Imran Siddiqi

TL;DR

This paper tackles automatic extraction of explicit information from chart images, a problem intensified by diverse chart layouts. It introduces ChartEye, a multi-task framework that combines Swin Transformer-based chart-type and text-role classification with YOLOv7 for text detection and ESRGAN-based text upscaling, followed by TPS-ResNet-BiLSTM-Attn text recognition. On the ICPR2022 CHARTINFO dataset, the approach achieves 0.97 F1 for chart-type classification, 0.91 F1 for text-role classification, and 0.95 mAP for text detection, demonstrating strong performance across chart types. A key contribution is leveraging hierarchical transformers to capture positional and relational cues, enabling a generic pipeline applicable to multiple chart types. The ESRGAN-based text enhancement further strengthens recognition, supporting reliable extraction of structured data from visual charts for downstream analytics.

Abstract

The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked process due to style variations and, as a consequence, it is challenging to design an end-to-end system. In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline. The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection. The detected text is then enhanced using Super Resolution Generative Adversarial Networks to improve the recognition output of the OCR. Experimental results on a benchmark dataset show that our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.

ChartEye: A Deep Learning Framework for Chart Information Extraction

TL;DR

This paper tackles automatic extraction of explicit information from chart images, a problem intensified by diverse chart layouts. It introduces ChartEye, a multi-task framework that combines Swin Transformer-based chart-type and text-role classification with YOLOv7 for text detection and ESRGAN-based text upscaling, followed by TPS-ResNet-BiLSTM-Attn text recognition. On the ICPR2022 CHARTINFO dataset, the approach achieves 0.97 F1 for chart-type classification, 0.91 F1 for text-role classification, and 0.95 mAP for text detection, demonstrating strong performance across chart types. A key contribution is leveraging hierarchical transformers to capture positional and relational cues, enabling a generic pipeline applicable to multiple chart types. The ESRGAN-based text enhancement further strengthens recognition, supporting reliable extraction of structured data from visual charts for downstream analytics.

Abstract

The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked process due to style variations and, as a consequence, it is challenging to design an end-to-end system. In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline. The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection. The detected text is then enhanced using Super Resolution Generative Adversarial Networks to improve the recognition output of the OCR. Experimental results on a benchmark dataset show that our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
Paper Structure (16 sections, 9 equations, 9 figures, 5 tables)

This paper contains 16 sections, 9 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: (a) Line Plot guerrini2005genetic (b) Vertical Bar Chart swai2006surveillance (c) Scatter Plot baxter2009root (d) Horizontal Bar Chart santo2012trends
  • Figure 2: System pipeline: A diagram illustrating how an input image is processed through different steps of the framework in a pipeline.
  • Figure 3: High-Level architecture of Swin transformer
  • Figure 4: Architecture of two successive Swin transformer blocks. Multi-headed self-attention modules having regular and shifted windowing configuration are used i.e W-MSA and SW-MSA
  • Figure 5: Detected text upscaling. Pipeline utilizing enhanced super resolution GAN to enhance the resolution of low resolution input image that is detected text (The circled dots represent that there are more number of basic blocks)
  • ...and 4 more figures