Table of Contents
Fetching ...

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst

TL;DR

This work defines a design space for information extraction from layout-rich documents using large language models, organizing the space into data structuring, model engagement, and output refinement. It introduces LayIE-LLM, an open-source test suite that systematically studies how input representations, chunking, prompting, and post-processing affect IE performance across diverse models and datasets. Through OFAT and brute-force benchmarking, the study shows that well-configured general-purpose LLMs can rival fine-tuned layout-aware models at lower labeling and computation costs, with multimodal LLMs offering the best performance at higher cost. The results provide practical guidance for configuring IE pipelines and deliver LayIE-LLM as a reproducible, extensible tool to benchmark future layout-aware IE approaches.

Abstract

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems and methods within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through LayIE-LLM, a new, open-source, layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. The results on two IE datasets show that LLMs require adjustment of the IE pipeline to achieve competitive performance: the optimized configuration found with LayIE-LLM achieves 13.3--37.5 F1 points more than a general-practice baseline configuration using the same LLM. To find a well-working configuration, we develop a one-factor-at-a-time (OFAT) method that achieves near-optimal results. Our method is only 0.8--1.8 points lower than the best full factorial exploration with a fraction (2.8%) of the required computation. Overall, we demonstrate that, if well-configured, general-purpose LLMs match the performance of specialized models, providing a cost-effective, finetuning-free alternative. Our test-suite is available at https://github.com/gayecolakoglu/LayIE-LLM.

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

TL;DR

This work defines a design space for information extraction from layout-rich documents using large language models, organizing the space into data structuring, model engagement, and output refinement. It introduces LayIE-LLM, an open-source test suite that systematically studies how input representations, chunking, prompting, and post-processing affect IE performance across diverse models and datasets. Through OFAT and brute-force benchmarking, the study shows that well-configured general-purpose LLMs can rival fine-tuned layout-aware models at lower labeling and computation costs, with multimodal LLMs offering the best performance at higher cost. The results provide practical guidance for configuring IE pipelines and deliver LayIE-LLM as a reproducible, extensible tool to benchmark future layout-aware IE approaches.

Abstract

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems and methods within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through LayIE-LLM, a new, open-source, layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. The results on two IE datasets show that LLMs require adjustment of the IE pipeline to achieve competitive performance: the optimized configuration found with LayIE-LLM achieves 13.3--37.5 F1 points more than a general-practice baseline configuration using the same LLM. To find a well-working configuration, we develop a one-factor-at-a-time (OFAT) method that achieves near-optimal results. Our method is only 0.8--1.8 points lower than the best full factorial exploration with a fraction (2.8%) of the required computation. Overall, we demonstrate that, if well-configured, general-purpose LLMs match the performance of specialized models, providing a cost-effective, finetuning-free alternative. Our test-suite is available at https://github.com/gayecolakoglu/LayIE-LLM.

Paper Structure

This paper contains 29 sections, 1 equation, 10 figures, 17 tables.

Figures (10)

  • Figure 1: Design space for IE from layout-rich documents using LLMs. The goal is to extract information relevant to the target data schema with correct mapping.
  • Figure 2: LayIE-LLM test suite for extracting information from LRDs using LLMs in six stages. The process begins with OCR-based text extraction and Markdown conversion with LLM assistance, followed by chunking to manage token limits, experimenting with different chunk sizes. Each chunk is processed into a prompt using Few-shot and CoT strategies with varying example counts. Prompts include a document example with key-value pairs, plus a new document and task. LLMs generate structured JSON outputs, which are decoded and reconciled. Post-processing includes data cleaning and entity mapping, followed by evaluation using two methods.
  • Figure 3: F1 scores of different LLMs across three configurations (Baseline, OFAT, and Brute-Force). Each bar represents the mean F1 score of a model for the corresponding configuration. The OFAT configuration uses a single global parameter set optimized across all models.
  • Figure 4: Few-Shot Prompt Structure with 1-shot Example.
  • Figure 5: Chain of Thought Prompt Structure with 1-shot Example.
  • ...and 5 more figures