Table of Contents
Fetching ...

ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing

Osama Abdellaif, Abdelrahman Nader, Ali Hamdi

TL;DR

The paper tackles the inefficiency and accuracy challenges of processing large volumes of immigration documents with traditional RPA solutions. It introduces ERPA, a multi-stage pipeline that couples state-of-the-art OCR with fine-tuned LLMs to extract, interpret, and validate ID data, producing structured JSON and reports. Benchmark results against UiPath and Automation Anywhere show dramatic speedups, with per-document extraction around $9.94$ seconds and up to $93\%$ time savings. ERPA's dynamic adaptability to diverse document formats and scalable design offer a practical, high-throughput solution for government workflows requiring fast and reliable document processing.

Abstract

This paper presents ERPA, an innovative Robotic Process Automation (RPA) model designed to enhance ID data extraction and optimize Optical Character Recognition (OCR) tasks within immigration workflows. Traditional RPA solutions often face performance limitations when processing large volumes of documents, leading to inefficiencies. ERPA addresses these challenges by incorporating Large Language Models (LLMs) to improve the accuracy and clarity of extracted text, effectively handling ambiguous characters and complex structures. Benchmark comparisons with leading platforms like UiPath and Automation Anywhere demonstrate that ERPA significantly reduces processing times by up to 94 percent, completing ID data extraction in just 9.94 seconds. These findings highlight ERPA's potential to revolutionize document automation, offering a faster and more reliable alternative to current RPA solutions.

ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing

TL;DR

The paper tackles the inefficiency and accuracy challenges of processing large volumes of immigration documents with traditional RPA solutions. It introduces ERPA, a multi-stage pipeline that couples state-of-the-art OCR with fine-tuned LLMs to extract, interpret, and validate ID data, producing structured JSON and reports. Benchmark results against UiPath and Automation Anywhere show dramatic speedups, with per-document extraction around seconds and up to time savings. ERPA's dynamic adaptability to diverse document formats and scalable design offer a practical, high-throughput solution for government workflows requiring fast and reliable document processing.

Abstract

This paper presents ERPA, an innovative Robotic Process Automation (RPA) model designed to enhance ID data extraction and optimize Optical Character Recognition (OCR) tasks within immigration workflows. Traditional RPA solutions often face performance limitations when processing large volumes of documents, leading to inefficiencies. ERPA addresses these challenges by incorporating Large Language Models (LLMs) to improve the accuracy and clarity of extracted text, effectively handling ambiguous characters and complex structures. Benchmark comparisons with leading platforms like UiPath and Automation Anywhere demonstrate that ERPA significantly reduces processing times by up to 94 percent, completing ID data extraction in just 9.94 seconds. These findings highlight ERPA's potential to revolutionize document automation, offering a faster and more reliable alternative to current RPA solutions.
Paper Structure (23 sections, 5 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 5 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: ERPA system architecture: The diagram illustrates the workflow of the proposed ERPA model for automating ID document processing. The process begins with the system application monitoring the folder for new files (steps a, b). When a new image is detected (c), the system applies OCR to extract the relevant text features (d). The extracted text is then processed by a Large Language Model (LLM) (e), which helps to understand and interpret the text (g) while identifying key features (f). Once the text is interpreted, the system creates a structured JSON file (h) to help populating both the database (j) and a report (i). This architecture ensures efficient and accurate ID document processing while maintaining scalability for high-volume workflows.
  • Figure 2: ERPA system architecture: The diagram illustrates the workflow of the proposed ERPA model for automating ID document processing. The process begins with the system application monitoring the folder for new files (steps a, b). When a new image is detected (c), the system applies OCR to extract the relevant text features (d). The extracted text is then processed by a Large Language Model (LLM) (e), which helps to understand and interpret the text (g) while identifying key features (f). Once the text is interpreted, the system creates a structured JSON file (h) and populates both the database (j) and a report (i). This architecture ensures efficient and accurate ID document processing while maintaining scalability for high-volume workflows.