Table of Contents
Fetching ...

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

Ran Zmigrod, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

TL;DR

This work addresses the challenge of end-to-end form parsing (VRFU) by identifying limitations in FUNSD-type annotations and proposing TreeForm, a JSON-encoded tree representation that captures hierarchical and tabular form structure. It introduces a novel end-to-end F1 metric and the greedy-aligned tree-edit distance (GAnTED) for holistic evaluation, along with a method to convert FUNSD annotations into TreeForm. Baselines using LayoutXLM and Donut on FUNSD/XFUND demonstrate the viability of TreeForm and reveal tradeoffs between labeling accuracy, edge linking, and structural understanding. By standardizing both annotation and evaluation through TreeForm, the approach aims to spur deeper research into annotating, modeling, and evaluating complex form-like documents, and motivates creating a dedicated TreeForm dataset.

Abstract

Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and describe a new content-agnostic, tree-based annotation scheme for VRFU: TreeForm. We provide methods to convert previous annotation schemes into TreeForm structures and evaluate TreeForm predictions using a modified version of the normalized tree-edit distance. We present initial baselines for our end-to-end performance metric and the TreeForm edit distance, averaged over the FUNSD and XFUND datasets, of 61.5 and 26.4 respectively. We hope that TreeForm encourages deeper research in annotating, modeling, and evaluating the complexities of form-like documents.

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

TL;DR

This work addresses the challenge of end-to-end form parsing (VRFU) by identifying limitations in FUNSD-type annotations and proposing TreeForm, a JSON-encoded tree representation that captures hierarchical and tabular form structure. It introduces a novel end-to-end F1 metric and the greedy-aligned tree-edit distance (GAnTED) for holistic evaluation, along with a method to convert FUNSD annotations into TreeForm. Baselines using LayoutXLM and Donut on FUNSD/XFUND demonstrate the viability of TreeForm and reveal tradeoffs between labeling accuracy, edge linking, and structural understanding. By standardizing both annotation and evaluation through TreeForm, the approach aims to spur deeper research into annotating, modeling, and evaluating complex form-like documents, and motivates creating a dedicated TreeForm dataset.

Abstract

Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and describe a new content-agnostic, tree-based annotation scheme for VRFU: TreeForm. We provide methods to convert previous annotation schemes into TreeForm structures and evaluate TreeForm predictions using a modified version of the normalized tree-edit distance. We present initial baselines for our end-to-end performance metric and the TreeForm edit distance, averaged over the FUNSD and XFUND datasets, of 61.5 and 26.4 respectively. We hope that TreeForm encourages deeper research in annotating, modeling, and evaluating the complexities of form-like documents.
Paper Structure (28 sections, 1 equation, 2 figures, 2 tables)

This paper contains 28 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Excerpt of a FUNSD form. Headers are marked in burgundy, questions are marked green, and answers are marked in blue. Entity links provided by the FUNSD annotation schemes are marked in orange. Links in pink were created for TreeForm.
  • Figure 2: Different annotation schemes for excerpt of FUNSD form given in \ref{['fig:funsd-example']}.