Table of Contents
Fetching ...

TableCenterNet: A one-stage network for table structure recognition

Anyi Xiao, Cihui Yang

TL;DR

TableCenterNet introduces a one-stage end-to-end table structure parsing network that simultaneously regresses spatial and logical cell locations in parallel, leveraging a Cycle-Pairing Module and interpolation maps to align physical and logical indices without multi-stage post-processing. By predicting cell centers, corners, and row/column spans within a unified framework, it achieves strong robustness across diverse table layouts and state-of-the-art performance on the TableGraph-24k dataset. The approach reduces training complexity and speeds up inference compared to two-stage methods, while maintaining or improving accuracy on challenging benchmarks such as ICDAR-2013 and WTTable in the Wild, and enabling efficient deployment on edge devices. Overall, TableCenterNet demonstrates that end-to-end, one-stage TSR with spatial-logical regression can effectively handle cross-scenario scalability and complex cell merging, offering practical impact for document understanding pipelines.

Abstract

Table structure recognition aims to parse tables in unstructured data into machine-understandable formats. Recent methods address this problem through a two-stage process or optimized one-stage approaches. However, these methods either require multiple networks to be serially trained and perform more time-consuming sequential decoding, or rely on complex post-processing algorithms to parse the logical structure of tables. They struggle to balance cross-scenario adaptability, robustness, and computational efficiency. In this paper, we propose a one-stage end-to-end table structure parsing network called TableCenterNet. This network unifies the prediction of table spatial and logical structure into a parallel regression task for the first time, and implicitly learns the spatial-logical location mapping laws of cells through a synergistic architecture of shared feature extraction layers and task-specific decoding. Compared with two-stage methods, our method is easier to train and faster to infer. Experiments on benchmark datasets show that TableCenterNet can effectively parse table structures in diverse scenarios and achieve state-of-the-art performance on the TableGraph-24k dataset. Code is available at https://github.com/dreamy-xay/TableCenterNet.

TableCenterNet: A one-stage network for table structure recognition

TL;DR

TableCenterNet introduces a one-stage end-to-end table structure parsing network that simultaneously regresses spatial and logical cell locations in parallel, leveraging a Cycle-Pairing Module and interpolation maps to align physical and logical indices without multi-stage post-processing. By predicting cell centers, corners, and row/column spans within a unified framework, it achieves strong robustness across diverse table layouts and state-of-the-art performance on the TableGraph-24k dataset. The approach reduces training complexity and speeds up inference compared to two-stage methods, while maintaining or improving accuracy on challenging benchmarks such as ICDAR-2013 and WTTable in the Wild, and enabling efficient deployment on edge devices. Overall, TableCenterNet demonstrates that end-to-end, one-stage TSR with spatial-logical regression can effectively handle cross-scenario scalability and complex cell merging, offering practical impact for document understanding pipelines.

Abstract

Table structure recognition aims to parse tables in unstructured data into machine-understandable formats. Recent methods address this problem through a two-stage process or optimized one-stage approaches. However, these methods either require multiple networks to be serially trained and perform more time-consuming sequential decoding, or rely on complex post-processing algorithms to parse the logical structure of tables. They struggle to balance cross-scenario adaptability, robustness, and computational efficiency. In this paper, we propose a one-stage end-to-end table structure parsing network called TableCenterNet. This network unifies the prediction of table spatial and logical structure into a parallel regression task for the first time, and implicitly learns the spatial-logical location mapping laws of cells through a synergistic architecture of shared feature extraction layers and task-specific decoding. Compared with two-stage methods, our method is easier to train and faster to infer. Experiments on benchmark datasets show that TableCenterNet can effectively parse table structures in diverse scenarios and achieve state-of-the-art performance on the TableGraph-24k dataset. Code is available at https://github.com/dreamy-xay/TableCenterNet.

Paper Structure

This paper contains 19 sections, 17 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparison of our one-stage approach with previous two-stage approaches.
  • Figure 2: The architecture of our proposed method. Each "Conv" in the illustration represents a regression head, which comprises a $3\times3$ convolution followed by a $1\times1$ convolution.
  • Figure 3: Visualization of row and column interpolation maps generated by the algorithm.
  • Figure 4: Visualization of cell spatial location alignment. (a) is the result of cell spatial location regression, (b) is the result of cell corner point detection, and (c) is the result after cell spatial location alignment.
  • Figure 5: Flow of converting cells into grids. First, based on the physical coordinates and logical indexes of cell corners, row and column dividers are grouped. Then, the two types of dividers are completed by fitting and intersecting with each other. Finally, the completed row and column dividers are fused by corners to generate logical grids.
  • ...and 2 more figures