TableBank: A Benchmark Dataset for Table Detection and Recognition

Minghao Li; Lei Cui; Shaohan Huang; Furu Wei; Ming Zhou; Zhoujun Li

TableBank: A Benchmark Dataset for Table Detection and Recognition

Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

TL;DR

TableBank introduces a large-scale, weakly supervised dataset for image-based table detection and recognition, derived from Word and LaTeX documents to achieve 417k labeled tables across diverse domains. It establishes strong baselines using Faster R-CNN for table detection and an image-to-markup encoder-decoder for table structure recognition, demonstrating domain-specific performance and improved cross-domain generalization when training on mixed-domain data. The results highlight the necessity of large, varied training data for robust table analysis and show deep learning methods outperform traditional OCR-based tools on this task. The authors publicly release TableBank and plan to expand to additional domains and finer-grained document components to further advance table analysis research.

Abstract

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousand human-labeled examples, which is difficult to generalize on real-world applications. With TableBank that contains 417K high quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available and hope it will empower more deep learning approaches in the table detection and recognition task. The dataset and models are available at \url{https://github.com/doc-analysis/TableBank}.

TableBank: A Benchmark Dataset for Table Detection and Recognition

TL;DR

Abstract

TableBank: A Benchmark Dataset for Table Detection and Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)