Arctic-Extract Technical Report
Mateusz Chiliński, Julita Ołtusek, Wojciech Jaśkowski
TL;DR
Arctic-Extract addresses the need for capable document understanding on resource-limited hardware. It leverages a Qwen2.5-VL-inspired architecture with token compression enabling a 128k context window and 125-page processing on an A10 GPU, combined with LoRA fine-tuning and 4-bit AWQ quantization to stay within 6.6 GiB. Evaluations across SQuAD2.0, DocVQA, multilingual benchmarks, and TE show competitive performance relative to much larger models, with strong multilingual and table-extraction capabilities. The results indicate a practical, scalable IDP solution that balances accuracy, efficiency, and deployment cost for real-world document processing.
Abstract
Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.
