Table of Contents
Fetching ...

TABLET: Table Structure Recognition using Encoder-only Transformers

Qiyu Hou, Jun Wang

TL;DR

TABLET introduces aSplit-Merge framework using encoder-only Transformers to tackle table structure recognition in large, dense tables. The split model performs horizontal and vertical line splitting via dual 1D Transformers on high-resolution feature streams, while the merge model uses RoIAlign-based grid-cell features and a Transformer with 2D positional embeddings to classify grid cells into OTSL tokens, producing HTML layouts. Extensive experiments on FinTabNet and PubTabNet show superior accuracy and competitive TEDS scores, with strong robustness against misalignment and much faster inference than autoregressive approaches. The approach is well-suited for industrial deployment due to high accuracy, reduced resolution loss, and fast processing speeds, even on large-scale business documents.

Abstract

To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.

TABLET: Table Structure Recognition using Encoder-only Transformers

TL;DR

TABLET introduces aSplit-Merge framework using encoder-only Transformers to tackle table structure recognition in large, dense tables. The split model performs horizontal and vertical line splitting via dual 1D Transformers on high-resolution feature streams, while the merge model uses RoIAlign-based grid-cell features and a Transformer with 2D positional embeddings to classify grid cells into OTSL tokens, producing HTML layouts. Extensive experiments on FinTabNet and PubTabNet show superior accuracy and competitive TEDS scores, with strong robustness against misalignment and much faster inference than autoregressive approaches. The approach is well-suited for industrial deployment due to high accuracy, reduced resolution loss, and fast processing speeds, even on large-scale business documents.

Abstract

To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.

Paper Structure

This paper contains 20 sections, 2 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: An example of a densely populated table from a financial announcement and its corresponding table structure recognition.
  • Figure 2: System framework.
  • Figure 3: An example of a recognition error caused by the overlap of text regions between adjacent columns.
  • Figure 4: An example of a column header completely misaligned with its corresponding content below it.
  • Figure 5: An example where text spanning multiple lines within a single cell is incorrectly split into separate rows.
  • ...and 5 more figures