LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

Nan Jiang; Shanchao Liang; Chengxiao Wang; Jiannan Wang; Lin Tan

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

Nan Jiang, Shanchao Liang, Chengxiao Wang, Jiannan Wang, Lin Tan

TL;DR

LATTE introduces the first iterative-refinement framework for LaTeX recognition of both formulae and tables, combining a generation model ($M_G$) with a fault localization model ($M_F$) and a refinement model ($M_R$) guided by delta-view feedback via the ImageEdit algorithm. The delta-view highlights pixel-column differences between ground-truth images and rendered drafts, enabling targeted refinements that localize faulty locations and regenerate only the necessary LaTeX portions. Evaluated on IM2LATEX-ATEX-100K and TAB2LATEX (TAB2LATEX dataset), LATTE significantly improves exact-match accuracy over prior methods and commercial tools, with notable gains in refinement rates across multiple rounds. The work also provides a new TAB2LATEX dataset, detailed training regimens, and prompts for prompting state-of-the-art LLMs, illustrating the value of iterative refinement and delta-view feedback for end-to-end LaTeX extraction from document images.

Abstract

Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is not WYSWYG -- what you see is what you get, i.e., the LaTeX source and rendered PDF images look drastically different, especially for formulae and tables. This gap makes it hard to modify or export LaTeX sources for formulae and tables from PDF images, and existing work is still limited. First, prior work generates LaTeX sources in a single iteration and struggles with complex LaTeX formulae. Second, existing work mainly recognizes and extracts LaTeX sources for formulae; and is incapable or ineffective for tables. This paper proposes LATTE, the first iterative refinement framework for LaTeX recognition. Specifically, we propose delta-view as feedback, which compares and pinpoints the differences between a pair of rendered images of the extracted LaTeX source and the expected correct image. Such delta-view feedback enables our fault localization model to localize the faulty parts of the incorrect recognition more accurately and enables our LaTeX refinement model to repair the incorrect extraction more accurately. LATTE improves the LaTeX source extraction accuracy of both LaTeX formulae and tables, outperforming existing techniques as well as GPT-4V by at least 7.03% of exact match, with a success refinement rate of 46.08% (formula) and 25.51% (table).

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

TL;DR

LATTE introduces the first iterative-refinement framework for LaTeX recognition of both formulae and tables, combining a generation model (

) with a fault localization model (

) and a refinement model (

) guided by delta-view feedback via the ImageEdit algorithm. The delta-view highlights pixel-column differences between ground-truth images and rendered drafts, enabling targeted refinements that localize faulty locations and regenerate only the necessary LaTeX portions. Evaluated on IM2LATEX-ATEX-100K and TAB2LATEX (TAB2LATEX dataset), LATTE significantly improves exact-match accuracy over prior methods and commercial tools, with notable gains in refinement rates across multiple rounds. The work also provides a new TAB2LATEX dataset, detailed training regimens, and prompts for prompting state-of-the-art LLMs, illustrating the value of iterative refinement and delta-view feedback for end-to-end LaTeX extraction from document images.

Abstract

Paper Structure (43 sections, 2 equations, 13 figures, 4 tables, 3 algorithms)

This paper contains 43 sections, 2 equations, 13 figures, 4 tables, 3 algorithms.

Introduction
Approach
Generation Phase
Evaluation and Feedback Generation
Iterative-Refinement Phase
Fault Localization Model
Refinement Model with Fault Location
Experimental Setup
Datasets
Formula Models Training
Table Models Training
Results
RQ1: Latte Recognition Accuracy
Formulae
Tables
...and 28 more sections

Figures (13)

Figure 1: Overview of Latte. $M_G$ is the initial LaTeX source generation model, $M_F$ is the fault localization model, and $M_R$ is the refinement model.
Figure 2: Formula and table examples of delta-view generated by the ImageEdit algorithm.
Figure 3: Fault localization model architecture.
Figure 4: Workflow of the refinement model.
Figure 5: Example of Latte$_2$'s correct refinement.
...and 8 more figures

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

TL;DR

Abstract

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (13)