Arctic-Extract Technical Report

Mateusz Chiliński; Julita Ołtusek; Wojciech Jaśkowski

Arctic-Extract Technical Report

Mateusz Chiliński, Julita Ołtusek, Wojciech Jaśkowski

TL;DR

Arctic-Extract addresses the need for capable document understanding on resource-limited hardware. It leverages a Qwen2.5-VL-inspired architecture with token compression enabling a 128k context window and 125-page processing on an A10 GPU, combined with LoRA fine-tuning and 4-bit AWQ quantization to stay within 6.6 GiB. Evaluations across SQuAD2.0, DocVQA, multilingual benchmarks, and TE show competitive performance relative to much larger models, with strong multilingual and table-extraction capabilities. The results indicate a practical, scalable IDP solution that balances accuracy, efficiency, and deployment cost for real-world document processing.

Abstract

Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.

Arctic-Extract Technical Report

TL;DR

Abstract

Arctic-Extract Technical Report

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)