Table of Contents
Fetching ...

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

Jianhua Zhu, Wenqi Zhao, Yu Li, Xingjian Hu, Liangcai Gao

TL;DR

HMER faces challenges in capturing hierarchical math syntax and ensuring LaTeX grammar. TAMER introduces a Tree-Aware Transformer that jointly optimizes sequence decoding with explicit tree-structure prediction, using a Tree-Aware Module and a Tree Structure Prediction Scoring Mechanism during inference. The model is trained with the combined loss $L = L_{seq} + L_{struct}$ and leverages a Transformer-based encoder–decoder with a DenseNet visual front end to produce syntactically valid LaTeX outputs, achieving state-of-the-art ExpRate on CROHME datasets (61.23%, 60.26%, 61.97%) and strong results on HME100K, along with improved bracket matching. This approach demonstrates that coupling sequence and tree reasoning yields better generalization to complex expressions and enhances practical reliability in handwritten math recognition.

Abstract

Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict $\LaTeX$ sequences, struggle to understand and model the inherent tree structure of $\LaTeX$ and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we propose a novel model named TAMER (Tree-Aware Transformer) for handwritten mathematical expression recognition. TAMER introduces an innovative Tree-aware Module while maintaining the flexibility and efficient training of Transformer. TAMER combines the advantages of both sequence decoding and tree decoding models by jointly optimizing sequence prediction and tree structure prediction tasks, which enhances the model's understanding and generalization of complex mathematical expression structures. During inference, TAMER employs a Tree Structure Prediction Scoring Mechanism to improve the structural validity of the generated $\LaTeX$ sequences. Experimental results on CROHME datasets demonstrate that TAMER outperforms traditional sequence decoding and tree decoding models, especially in handling complex mathematical structures, achieving state-of-the-art (SOTA) performance.

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

TL;DR

HMER faces challenges in capturing hierarchical math syntax and ensuring LaTeX grammar. TAMER introduces a Tree-Aware Transformer that jointly optimizes sequence decoding with explicit tree-structure prediction, using a Tree-Aware Module and a Tree Structure Prediction Scoring Mechanism during inference. The model is trained with the combined loss and leverages a Transformer-based encoder–decoder with a DenseNet visual front end to produce syntactically valid LaTeX outputs, achieving state-of-the-art ExpRate on CROHME datasets (61.23%, 60.26%, 61.97%) and strong results on HME100K, along with improved bracket matching. This approach demonstrates that coupling sequence and tree reasoning yields better generalization to complex expressions and enhances practical reliability in handwritten math recognition.

Abstract

Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict sequences, struggle to understand and model the inherent tree structure of and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we propose a novel model named TAMER (Tree-Aware Transformer) for handwritten mathematical expression recognition. TAMER introduces an innovative Tree-aware Module while maintaining the flexibility and efficient training of Transformer. TAMER combines the advantages of both sequence decoding and tree decoding models by jointly optimizing sequence prediction and tree structure prediction tasks, which enhances the model's understanding and generalization of complex mathematical expression structures. During inference, TAMER employs a Tree Structure Prediction Scoring Mechanism to improve the structural validity of the generated sequences. Experimental results on CROHME datasets demonstrate that TAMER outperforms traditional sequence decoding and tree decoding models, especially in handling complex mathematical structures, achieving state-of-the-art (SOTA) performance.
Paper Structure (24 sections, 8 equations, 8 figures, 4 tables)

This paper contains 24 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Bracket Matching Accuracy under different structural complexities on CROHME 2014(in %). TAMER maintains a bracket matching accuracy of over 92% across all levels of structural complexity, significantly outperforming CoMER and ICAL.
  • Figure 2: The architecture of TAMER. TAMER has 4 components: (1) Visual Encoder: DenseNet. (2) Sinusoidal Positional Encoding: image and word. (3) Decoder: Transformer Decoder with Coverage Attention. (4) Tree-Aware Module(TAM).
  • Figure 3: The architecture of Tree-aware Module(TAM). In the relationship score matrix, the dark red position indicates where the $i^{th}$ character is the child node and the $j^{th}$ character is the parent node.
  • Figure 4: Examples of structural complexity for different expressions
  • Figure 5: ExpRate under different structural complexities on CROHME 2014(in %).
  • ...and 3 more figures