Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer
Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado
TL;DR
The paper tackles converting handwritten mathematical expressions into LaTeX code by framing the problem as an image-to-sequence task solved with encoder–decoder architectures. It systematically compares a CNN encoder with an LSTM decoder, a fine-tuned pretrained ResNet50 encoder, and a Vision Transformer with a transformer-based decoder. Results show that Vision Transformer models deliver superior accuracy, BLEU-4 scores, and lower Levenshtein distances compared with CNN–LSTM and ResNet–LSTM baselines, highlighting the effectiveness of self-attention and patch-based representations for this multimodal task. The study also demonstrates the benefits of transfer learning and provides an open implementation to enable reproducibility and further research in automated mathematical transcription.
Abstract
Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential for further improvement through fine-tuning of model parameters. To encourage open research, we also provide the model implementation, enabling reproduction of our results and facilitating further research in this domain.
