Table of Contents
Fetching ...

SEMv2: Table Separation Line Detection Based on Instance Segmentation

Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Huihui Zhu, Baocai Yin, Bing Yin, Cong Liu

TL;DR

SEMv2 reframes table structure recognition as a split-and-merge problem where the split stage performs instance segmentation to detect individual table separation lines. A dedicated splitter embeds a Gather-based kernel design to predict line masks, while an embedder with RoIAlign and a transformer yields grid-level features, and a parallel, conditional-convolution merger assembles grids into complete table cells. The authors introduce iFLYTAB, a large-scale, diverse TSR dataset with both wired and wireless tables and rich geometric annotations, to benchmark robustness in real-world scenarios. Across SciTSR, PubTabNet, cTDaR, WTW, and iFLYTAB, SEMv2 achieves state-of-the-art performance, with ablations confirming the effectiveness of the instance-seg splitter, Gather module, and parallel merger. Limitations include handling severe rotations and multi-line text content without text-aware features, which motivates future multi-modal enhancements and moiré/distortion mitigation for improved generalization.

Abstract

Table structure recognition is an indispensable element for enabling machines to comprehend tables. Its primary purpose is to identify the internal structure of a table. Nevertheless, due to the complexity and diversity of their structure and style, it is highly challenging to parse the tabular data into a structured format that machines can comprehend. In this work, we adhere to the principle of the split-and-merge based methods and propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge). Unlike the previous works in the ``split'' stage, we aim to address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution. Specifically, we design the ``split'' in a top-down manner that detects the table separation line instance first and then dynamically predicts the table separation line mask for each instance. The final table separation line shape can be accurately obtained by processing the table separation line mask in a row-wise/column-wise manner. To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB, which encompasses multiple style tables in various scenarios such as photos, scanned documents, etc. Extensive experiments on publicly available datasets (e.g. SciTSR, PubTabNet and iFLYTAB) demonstrate the efficacy of our proposed approach. The code and iFLYTAB dataset are available at https://github.com/ZZR8066/SEMv2.

SEMv2: Table Separation Line Detection Based on Instance Segmentation

TL;DR

SEMv2 reframes table structure recognition as a split-and-merge problem where the split stage performs instance segmentation to detect individual table separation lines. A dedicated splitter embeds a Gather-based kernel design to predict line masks, while an embedder with RoIAlign and a transformer yields grid-level features, and a parallel, conditional-convolution merger assembles grids into complete table cells. The authors introduce iFLYTAB, a large-scale, diverse TSR dataset with both wired and wireless tables and rich geometric annotations, to benchmark robustness in real-world scenarios. Across SciTSR, PubTabNet, cTDaR, WTW, and iFLYTAB, SEMv2 achieves state-of-the-art performance, with ablations confirming the effectiveness of the instance-seg splitter, Gather module, and parallel merger. Limitations include handling severe rotations and multi-line text content without text-aware features, which motivates future multi-modal enhancements and moiré/distortion mitigation for improved generalization.

Abstract

Table structure recognition is an indispensable element for enabling machines to comprehend tables. Its primary purpose is to identify the internal structure of a table. Nevertheless, due to the complexity and diversity of their structure and style, it is highly challenging to parse the tabular data into a structured format that machines can comprehend. In this work, we adhere to the principle of the split-and-merge based methods and propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge). Unlike the previous works in the ``split'' stage, we aim to address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution. Specifically, we design the ``split'' in a top-down manner that detects the table separation line instance first and then dynamically predicts the table separation line mask for each instance. The final table separation line shape can be accurately obtained by processing the table separation line mask in a row-wise/column-wise manner. To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB, which encompasses multiple style tables in various scenarios such as photos, scanned documents, etc. Extensive experiments on publicly available datasets (e.g. SciTSR, PubTabNet and iFLYTAB) demonstrate the efficacy of our proposed approach. The code and iFLYTAB dataset are available at https://github.com/ZZR8066/SEMv2.
Paper Structure (20 sections, 11 equations, 18 figures, 10 tables)

This paper contains 20 sections, 11 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: Some table samples in the iFLYTAB dataset. (a)-(b) are wired tables. (c)-(e) are wireless tables.
  • Figure 2: Statistics of the iFLYTAB datasets.
  • Figure 3: The visualization of annotated physical coordinates. (a) refers to table cell polygons. (b) refers to text line polygons. Best view in zoom in.
  • Figure 4: The visualization of annotated row/column information. Then green polygons are the annotated row/column information. (a) refers to row information. (b) refers to column information. Best view in zoom in.
  • Figure 5: The overall architecture of SEMv2. $\boldsymbol{F}$ is the feature map generated by fusing the FPN feature maps ($P$2 to $P$5). The Splitter module consists of Kernel Branch and Feature Branch, and predicts table separation lines between different columns or rows, which can be further processed to obtain the Table Grid Structure. The Merger module predicts the table cell to which each table grid belongs. We omit the Embedder module for simplicity.
  • ...and 13 more figures