Table of Contents
Fetching ...

PAGED: A Benchmark for Procedural Graphs Extraction from Documents

Weihong Du, Wenrui Liao, Hongru Liang, Wenqiang Lei

TL;DR

PAGED introduces a standardized benchmark for automatic procedural graph extraction from documents by assembling the largest public high-quality dataset of 3,394 document-graph pairs derived from business process graphs. It presents a three-stage Data2Text pipeline (decomposition/transformation, grouping/ordering, aggregating/smoothing) to convert graphs into coherent procedural documents and provides automatic and human evaluation metrics. The study systematically evaluates five baselines and three advanced LLMs, revealing that rules and small datasets hinder existing methods, while LLMs improve textual element extraction but struggle with non-sequential logical organization; a novel self-refine strategy (condition and parallel verifiers) enhances gateway reasoning for LLMs. PAGED offers a major benchmark for progression in procedural graph extraction and insights into logic reasoning among non-sequential elements, with implications for pretraining LLMs on procedural knowledge and improving complex graph understanding.

Abstract

Automatic extraction of procedural graphs from documents creates a low-cost way for users to easily understand a complex procedure by skimming visual graphs. Despite the progress in recent studies, it remains unanswered: whether the existing studies have well solved this task (Q1) and whether the emerging large language models (LLMs) can bring new opportunities to this task (Q2). To this end, we propose a new benchmark PAGED, equipped with a large high-quality dataset and standard evaluations. It investigates five state-of-the-art baselines, revealing that they fail to extract optimal procedural graphs well because of their heavy reliance on hand-written rules and limited available data. We further involve three advanced LLMs in PAGED and enhance them with a novel self-refine strategy. The results point out the advantages of LLMs in identifying textual elements and their gaps in building logical structures. We hope PAGED can serve as a major landmark for automatic procedural graph extraction and the investigations in PAGED can offer insights into the research on logic reasoning among non-sequential elements.

PAGED: A Benchmark for Procedural Graphs Extraction from Documents

TL;DR

PAGED introduces a standardized benchmark for automatic procedural graph extraction from documents by assembling the largest public high-quality dataset of 3,394 document-graph pairs derived from business process graphs. It presents a three-stage Data2Text pipeline (decomposition/transformation, grouping/ordering, aggregating/smoothing) to convert graphs into coherent procedural documents and provides automatic and human evaluation metrics. The study systematically evaluates five baselines and three advanced LLMs, revealing that rules and small datasets hinder existing methods, while LLMs improve textual element extraction but struggle with non-sequential logical organization; a novel self-refine strategy (condition and parallel verifiers) enhances gateway reasoning for LLMs. PAGED offers a major benchmark for progression in procedural graph extraction and insights into logic reasoning among non-sequential elements, with implications for pretraining LLMs on procedural knowledge and improving complex graph understanding.

Abstract

Automatic extraction of procedural graphs from documents creates a low-cost way for users to easily understand a complex procedure by skimming visual graphs. Despite the progress in recent studies, it remains unanswered: whether the existing studies have well solved this task (Q1) and whether the emerging large language models (LLMs) can bring new opportunities to this task (Q2). To this end, we propose a new benchmark PAGED, equipped with a large high-quality dataset and standard evaluations. It investigates five state-of-the-art baselines, revealing that they fail to extract optimal procedural graphs well because of their heavy reliance on hand-written rules and limited available data. We further involve three advanced LLMs in PAGED and enhance them with a novel self-refine strategy. The results point out the advantages of LLMs in identifying textual elements and their gaps in building logical structures. We hope PAGED can serve as a major landmark for automatic procedural graph extraction and the investigations in PAGED can offer insights into the research on logic reasoning among non-sequential elements.
Paper Structure (46 sections, 1 equation, 11 figures, 4 tables)

This paper contains 46 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The procedure of how a restaurant serves the customers in procedural graph (a) and document (b).
  • Figure 2: Illustration of decomposing the graph into units and transforming a unit into a procedural fragment.
  • Figure 3: The grouping of procedural fragments using a pre-trained boundary identification model.
  • Figure 4: Comparison of our method with two variations via automatic and human evaluations.
  • Figure 5: The self-refine strategy, in which "System1" extracts procedural graphs and "System2" verifies gateways of graphs and provides feedback for refinement.
  • ...and 6 more figures