TableCenterNet: A one-stage network for table structure recognition
Anyi Xiao, Cihui Yang
TL;DR
TableCenterNet introduces a one-stage end-to-end table structure parsing network that simultaneously regresses spatial and logical cell locations in parallel, leveraging a Cycle-Pairing Module and interpolation maps to align physical and logical indices without multi-stage post-processing. By predicting cell centers, corners, and row/column spans within a unified framework, it achieves strong robustness across diverse table layouts and state-of-the-art performance on the TableGraph-24k dataset. The approach reduces training complexity and speeds up inference compared to two-stage methods, while maintaining or improving accuracy on challenging benchmarks such as ICDAR-2013 and WTTable in the Wild, and enabling efficient deployment on edge devices. Overall, TableCenterNet demonstrates that end-to-end, one-stage TSR with spatial-logical regression can effectively handle cross-scenario scalability and complex cell merging, offering practical impact for document understanding pipelines.
Abstract
Table structure recognition aims to parse tables in unstructured data into machine-understandable formats. Recent methods address this problem through a two-stage process or optimized one-stage approaches. However, these methods either require multiple networks to be serially trained and perform more time-consuming sequential decoding, or rely on complex post-processing algorithms to parse the logical structure of tables. They struggle to balance cross-scenario adaptability, robustness, and computational efficiency. In this paper, we propose a one-stage end-to-end table structure parsing network called TableCenterNet. This network unifies the prediction of table spatial and logical structure into a parallel regression task for the first time, and implicitly learns the spatial-logical location mapping laws of cells through a synergistic architecture of shared feature extraction layers and task-specific decoding. Compared with two-stage methods, our method is easier to train and faster to infer. Experiments on benchmark datasets show that TableCenterNet can effectively parse table structures in diverse scenarios and achieve state-of-the-art performance on the TableGraph-24k dataset. Code is available at https://github.com/dreamy-xay/TableCenterNet.
