Table of Contents
Fetching ...

Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network

Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong

TL;DR

MER suffers from multiple valid interpretations of complex formulas, causing parsing ambiguity. This work presents HDR, a large-scale MER dataset with HDR-100M training data and HDR-Test, and introduces HDNet, a Transformer-based encoder-decoder with a hierarchical sub-formula module that crops high-resolution sub-formulas and fuses their features via $Z = alpha Z_{main} + (1 - alpha) (1/n) sum_i Z_i$. The training objective combines main and sub-formula losses, $L_{total} = alpha L_{main} + (1 - alpha) (1/n) sum_i L_i$, including $L_{main} = - sum_{t=1}^{T} log p(y_t | y_{<t}, Z)$. A fair evaluation protocol maps predictions to functionally equivalent expressions and uses metrics such as $CR = 1 - EditDistance / NumberOfCharacters$, $AED$, and BLEU, with HDNet achieving state-of-the-art results on HDR-Test and public MER datasets. Overall, the work provides large-scale data, a detail-focused architecture, and fair evaluation practices that advance reliable MER for complex hierarchical formulas.

Abstract

Hierarchical and complex Mathematical Expression Recognition (MER) is challenging due to multiple possible interpretations of a formula, complicating both parsing and evaluation. In this paper, we introduce the Hierarchical Detail-Focused Recognition dataset (HDR), the first dataset specifically designed to address these issues. It consists of a large-scale training set, HDR-100M, offering an unprecedented scale and diversity with one hundred million training instances. And the test set, HDR-Test, includes multiple interpretations of complex hierarchical formulas for comprehensive model performance evaluation. Additionally, the parsing of complex formulas often suffers from errors in fine-grained details. To address this, we propose the Hierarchical Detail-Focused Recognition Network (HDNet), an innovative framework that incorporates a hierarchical sub-formula module, focusing on the precise handling of formula details, thereby significantly enhancing MER performance. Experimental results demonstrate that HDNet outperforms existing MER models across various datasets.

Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network

TL;DR

MER suffers from multiple valid interpretations of complex formulas, causing parsing ambiguity. This work presents HDR, a large-scale MER dataset with HDR-100M training data and HDR-Test, and introduces HDNet, a Transformer-based encoder-decoder with a hierarchical sub-formula module that crops high-resolution sub-formulas and fuses their features via . The training objective combines main and sub-formula losses, , including . A fair evaluation protocol maps predictions to functionally equivalent expressions and uses metrics such as , , and BLEU, with HDNet achieving state-of-the-art results on HDR-Test and public MER datasets. Overall, the work provides large-scale data, a detail-focused architecture, and fair evaluation practices that advance reliable MER for complex hierarchical formulas.

Abstract

Hierarchical and complex Mathematical Expression Recognition (MER) is challenging due to multiple possible interpretations of a formula, complicating both parsing and evaluation. In this paper, we introduce the Hierarchical Detail-Focused Recognition dataset (HDR), the first dataset specifically designed to address these issues. It consists of a large-scale training set, HDR-100M, offering an unprecedented scale and diversity with one hundred million training instances. And the test set, HDR-Test, includes multiple interpretations of complex hierarchical formulas for comprehensive model performance evaluation. Additionally, the parsing of complex formulas often suffers from errors in fine-grained details. To address this, we propose the Hierarchical Detail-Focused Recognition Network (HDNet), an innovative framework that incorporates a hierarchical sub-formula module, focusing on the precise handling of formula details, thereby significantly enhancing MER performance. Experimental results demonstrate that HDNet outperforms existing MER models across various datasets.
Paper Structure (14 sections, 3 equations, 4 figures, 4 tables)

This paper contains 14 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The model fails to capture intricate details in complex formulas, misrecognizing $-\frac{1}{2}z^2$ as $\frac{1}{2}z^2$.
  • Figure 2: In the training process (left), the formulas are parsed hierarchically based on their labels. Each formula is split, rendered, and resized into sub-formulas. The main formula is also rendered and resized. Both the main formula and sub-formulas are fed into the encoder to extract features. The sub-formula features are then fused with the main formula's feature through weighted aggregation to provide additional visual details. The weighted features are passed to the decoder to predict the result for the main formula. Additionally, each sub-formula feature is separately passed to the decoder to predict sub-formula results. The model's optimization objective includes the loss of the main formula, $L_{\text{main}}$, and the sum of the losses of the sub-formulas, $\sum_i^n L_i$.The predicted results are evaluated (right). We provide a fair evaluation method where even if two formulas differ at the character level, they are considered correctly parsed if they are functionally equivalent.
  • Figure 3: Comparison of datasets Im2latex-100k, UniMER-1M, and HDR, showing the number of hierarchical layers and the number of lines. Darker colors indicate higher complexity. The bar length represents total data volume.
  • Figure 4: Comparison of different models based on parameter counts (represented by the area of circles) and Fair-Character Recall on the HDR dataset. Larger circles represent models with more parameters, while the vertical position reflects the Fair-Character Recall performance.