Table of Contents
Fetching ...

Arctic-Extract Technical Report

Mateusz Chiliński, Julita Ołtusek, Wojciech Jaśkowski

TL;DR

Arctic-Extract addresses the need for capable document understanding on resource-limited hardware. It leverages a Qwen2.5-VL-inspired architecture with token compression enabling a 128k context window and 125-page processing on an A10 GPU, combined with LoRA fine-tuning and 4-bit AWQ quantization to stay within 6.6 GiB. Evaluations across SQuAD2.0, DocVQA, multilingual benchmarks, and TE show competitive performance relative to much larger models, with strong multilingual and table-extraction capabilities. The results indicate a practical, scalable IDP solution that balances accuracy, efficiency, and deployment cost for real-world document processing.

Abstract

Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.

Arctic-Extract Technical Report

TL;DR

Arctic-Extract addresses the need for capable document understanding on resource-limited hardware. It leverages a Qwen2.5-VL-inspired architecture with token compression enabling a 128k context window and 125-page processing on an A10 GPU, combined with LoRA fine-tuning and 4-bit AWQ quantization to stay within 6.6 GiB. Evaluations across SQuAD2.0, DocVQA, multilingual benchmarks, and TE show competitive performance relative to much larger models, with strong multilingual and table-extraction capabilities. The results indicate a practical, scalable IDP solution that balances accuracy, efficiency, and deployment cost for real-world document processing.

Abstract

Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.

Paper Structure

This paper contains 34 sections, 4 figures, 17 tables.

Figures (4)

  • Figure 1: Example of table with subsections.
  • Figure 2: Example of table with hierarchical header.
  • Figure 3: Example of table transposition.
  • Figure 4: Example of multiple table sources to join.