Table of Contents
Fetching ...

ExTTNet: A Deep Learning Algorithm for Extracting Table Texts from Invoice Images

Adem Akdoğan, Murat Kurt

TL;DR

The paper presents ExTTNet, a deep learning solution for autonomously extracting product table text from invoice images. It combines OCR via Tesseract with engineered features and a multilayer neural network trained on an RTX $3090$, achieving a reported F1 score of $0.92$ on a held-out set. Evaluations show high precision and recall (≈$0.93$) and overall accuracy of $0.92$, validated on a dataset of $8794$ German invoices. The approach emphasizes token-level prediction of table elements rather than whole-table localization, enabling partial corrections and potential workflow savings in accounting. Future work aims to integrate richer image cues and advanced techniques to further enhance performance and robustness.

Abstract

In this work, product tables in invoices are obtained autonomously via a deep learning model, which is named as ExTTNet. Firstly, text is obtained from invoice images using Optical Character Recognition (OCR) techniques. Tesseract OCR engine [37] is used for this process. Afterwards, the number of existing features is increased by using feature extraction methods to increase the accuracy. Labeling process is done according to whether each text obtained as a result of OCR is a table element or not. In this study, a multilayer artificial neural network model is used. The training has been carried out with an Nvidia RTX 3090 graphics card and taken $162$ minutes. As a result of the training, the F1 score is $0.92$.

ExTTNet: A Deep Learning Algorithm for Extracting Table Texts from Invoice Images

TL;DR

The paper presents ExTTNet, a deep learning solution for autonomously extracting product table text from invoice images. It combines OCR via Tesseract with engineered features and a multilayer neural network trained on an RTX , achieving a reported F1 score of on a held-out set. Evaluations show high precision and recall (≈) and overall accuracy of , validated on a dataset of German invoices. The approach emphasizes token-level prediction of table elements rather than whole-table localization, enabling partial corrections and potential workflow savings in accounting. Future work aims to integrate richer image cues and advanced techniques to further enhance performance and robustness.

Abstract

In this work, product tables in invoices are obtained autonomously via a deep learning model, which is named as ExTTNet. Firstly, text is obtained from invoice images using Optical Character Recognition (OCR) techniques. Tesseract OCR engine [37] is used for this process. Afterwards, the number of existing features is increased by using feature extraction methods to increase the accuracy. Labeling process is done according to whether each text obtained as a result of OCR is a table element or not. In this study, a multilayer artificial neural network model is used. The training has been carried out with an Nvidia RTX 3090 graphics card and taken minutes. As a result of the training, the F1 score is .
Paper Structure (8 sections, 4 figures, 3 tables)

This paper contains 8 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: OCR process.
  • Figure 2: Preparation and training procedure for documents employing our deep learning model (ExTTNet).
  • Figure 3: Architecture of our deep learning model (ExTTNet).
  • Figure 4: Sample invoice output generated using our deep learning model (ExTTNet). The text highlighted in green signifies the table elements detected by our deep learning model (ExTTNet).