Fine-Tuning BERTs for Definition Extraction from Mathematical Text
Lucy Horowitz, Ryan Hathaway
TL;DR
This work evaluates the ability of pre-trained BERT-based models to extract definitional sentences from mathematical LaTeX text by reframing the task as binary classification. Using two LaTeX-based datasets, Chicago and TAC, and a cleaned data-processing pipeline, the authors compare MathBERT, MathBERT-custom, and Sentence-BERT, with SBERT delivering the strongest overall performance and generalization within the mathematical domain. They explore dataset-specific effects, oversampling strategies, and cross-dataset generalization, including tests on the WFMALL dataset, and find that SBERT, particularly when oversampled, offers robust accuracy and recall, outperforming prior results in several settings. The findings suggest that fine-tuning existing mathematical NLP models can yield effective, scalable tools for defining concept extraction, contributing to more searchable and accessible mathematical knowledge bases such as MathGloss. The study also highlights the importance of dataset characteristics, such as definition density, in shaping model performance and generalization prospects.
Abstract
In this paper, we fine-tuned three pre-trained BERT models on the task of "definition extraction" from mathematical English written in LaTeX. This is presented as a binary classification problem, where either a sentence contains a definition of a mathematical term or it does not. We used two original data sets, "Chicago" and "TAC," to fine-tune and test these models. We also tested on WFMALL, a dataset presented by Vanetik and Litvak in 2021 and compared the performance of our models to theirs. We found that a high-performance Sentence-BERT transformer model performed best based on overall accuracy, recall, and precision metrics, achieving comparable results to the earlier models with less computational effort.
