ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing
Osama Abdellaif, Abdelrahman Nader, Ali Hamdi
TL;DR
The paper tackles the inefficiency and accuracy challenges of processing large volumes of immigration documents with traditional RPA solutions. It introduces ERPA, a multi-stage pipeline that couples state-of-the-art OCR with fine-tuned LLMs to extract, interpret, and validate ID data, producing structured JSON and reports. Benchmark results against UiPath and Automation Anywhere show dramatic speedups, with per-document extraction around $9.94$ seconds and up to $93\%$ time savings. ERPA's dynamic adaptability to diverse document formats and scalable design offer a practical, high-throughput solution for government workflows requiring fast and reliable document processing.
Abstract
This paper presents ERPA, an innovative Robotic Process Automation (RPA) model designed to enhance ID data extraction and optimize Optical Character Recognition (OCR) tasks within immigration workflows. Traditional RPA solutions often face performance limitations when processing large volumes of documents, leading to inefficiencies. ERPA addresses these challenges by incorporating Large Language Models (LLMs) to improve the accuracy and clarity of extracted text, effectively handling ambiguous characters and complex structures. Benchmark comparisons with leading platforms like UiPath and Automation Anywhere demonstrate that ERPA significantly reduces processing times by up to 94 percent, completing ID data extraction in just 9.94 seconds. These findings highlight ERPA's potential to revolutionize document automation, offering a faster and more reliable alternative to current RPA solutions.
