Uncertainty Quantification in Table Structure Recognition

Kehinde Ajayi; Leizhen Zhang; Yi He; Jian Wu

Uncertainty Quantification in Table Structure Recognition

Kehinde Ajayi, Leizhen Zhang, Yi He, Jian Wu

TL;DR

This paper tackles uncertainty quantification in table structure recognition (TSR) by introducing a novel Test-Time Augmentation-based UQ pipeline, termed TTA-m, which trains multiple fine-tuned TSR models on augmented data and ensembles their predictions. Uncertainty is quantified through an ensemble mechanism that merges predictions using an IoU threshold $\theta_0$ and assigns a per-cell confidence score $\frac{c}{M+1}$, where $c$ is the number of contributing models. To enable evaluation without ground-truth uncertainty, the authors propose two proxies: masking (varying pixel intensities) and cell complexity (modeled via a cell adjacency graph). Evaluations on the ICDAR-19 TSR dataset with CascadeTabNet show that TTA-m improves cell-detection F1 and provides reliable confidence-based uncertainty estimates, albeit with higher computational cost. The approach advances practical TSR by enabling selective human review and lays groundwork for robust, uncertainty-aware document understanding systems, especially in scientific domains.

Abstract

Quantifying uncertainties for machine learning models is a critical step to reduce human verification effort by detecting predictions with low confidence. This paper proposes a method for uncertainty quantification (UQ) of table structure recognition (TSR). The proposed UQ method is built upon a mixture-of-expert approach termed Test-Time Augmentation (TTA). Our key idea is to enrich and diversify the table representations, to spotlight the cells with high recognition uncertainties. To evaluate the effectiveness, we proposed two heuristics to differentiate highly uncertain cells from normal cells, namely, masking and cell complexity quantification. Masking involves varying the pixel intensity to deem the detection uncertainty. Cell complexity quantification gauges the uncertainty of each cell by its topological relation with neighboring cells. The evaluation results based on standard benchmark datasets demonstrate that the proposed method is effective in quantifying uncertainty in TSR models. To our best knowledge, this study is the first of its kind to enable UQ in TSR tasks. Our code and data are available at: https://github.com/lamps-lab/UQTTA.git.

Uncertainty Quantification in Table Structure Recognition

TL;DR

and assigns a per-cell confidence score

, where

is the number of contributing models. To enable evaluation without ground-truth uncertainty, the authors propose two proxies: masking (varying pixel intensities) and cell complexity (modeled via a cell adjacency graph). Evaluations on the ICDAR-19 TSR dataset with CascadeTabNet show that TTA-m improves cell-detection F1 and provides reliable confidence-based uncertainty estimates, albeit with higher computational cost. The approach advances practical TSR by enabling selective human review and lays groundwork for robust, uncertainty-aware document understanding systems, especially in scientific domains.

Abstract

Paper Structure (24 sections, 7 figures, 2 tables)

This paper contains 24 sections, 7 figures, 2 tables.

Introduction
Related Work
Uncertainty Quantification
Table Structure Recognition
TTA-m: Proposed UQ Pipeline
Data Augmentation
Fine-tuning A Pre-trained TSR Model
Predictions With Fine-tuned Model
Uncertainty Estimation via Ensembles
Evaluation
Masking
Cell complexity quantification.
Experimental Setup
Data
Baseline Methods
...and 9 more sections

Figures (7)

Figure 1: A schematic illustration of the proposed UQ pipeline (TTA-m). In the training phase, we fine-tuned the TSR model on the original tables and augmented tables. In the test phase, each model makes a prediction on table images similar to what it was trained on and then ensembling is applied on the model outputs. NAT: Non-Augmented Tables, NLT: No Lines Tables, HLT: Horizontal Lines Tables, VLT: Vertical Lines Tables.
Figure 2: Augmentation examples of a table image.
Figure 3: A schematic illustration of how to calculate confidence scores using bounding boxes predicted by three models (a, b, and c). Red color: 3/3 = 100% confidence, Green color: 2/3 = 66.7% confidence, Other colors: 1/3 = 33.3% confidence.
Figure 4: An example of the graph model of a table. Each cell is enclosed by a red box, with an ID labeled next to it. The dashed lines represent the connections of a cell to its adjacency cells and can be used for counting the adjacency degrees of a cell. For instance, cell 5 is connected by 2 green lines, so it has an adjancency degree of 2.
Figure 5: A schematic comparison of TTA variants implemented by this paper. TTA-m is proposed for its highest F1 over the others (Table \ref{['tab:retrain']}).
...and 2 more figures

Uncertainty Quantification in Table Structure Recognition

TL;DR

Abstract

Uncertainty Quantification in Table Structure Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (7)