Table of Contents
Fetching ...

Towards Semantic Markup of Mathematical Documents via User Interaction

Luka Vrečar, Joe Wells, Fairouz Kamareddine

TL;DR

This work addresses the ambiguity inherent in mathematical LaTeX documents by advocating semantic markup through sTeX and proposing a semi-automatic -ification pipeline that (i) derives grammars from existing semantic macro definitions and (ii) parses formulas with these grammars. It combines GLR parsing via parglare with a GUI to disambiguate parse results, enabling conversion of parse trees into semantically annotated macros without imposing heavy learning requirements on authors. Demonstration on an untyped $\\lambda$-term grammar illustrates how grammars can be augmented with types and precedence, and how standardised parse actions yield ASTs that map cleanly to semantic macros. The approach promises practical benefits like hyperlinks to definitions, interoperability with proof systems and computer algebra systems, and enhanced accessibility, while also highlighting limitations such as cyclic grammars and incomplete macro typings and outlining concrete future improvements.

Abstract

Mathematical documents written in LaTeX often contain ambiguities. We can resolve some of them via semantic markup using, e.g., sTeX, which also has other potential benefits, such as interoperability with computer algebra systems, proof systems, and increased accessibility. However, semantic markup is more involved than "regular" typesetting and presents a challenge for authors of mathematical documents. We aim to smooth out the transition from plain LaTeX to semantic markup by developing semi-automatic tools for authors. In this paper we present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing sTeX macro definitions and parsing mathematical formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its functionality and potential using a grammar for parsing untyped $λ$-terms.

Towards Semantic Markup of Mathematical Documents via User Interaction

TL;DR

This work addresses the ambiguity inherent in mathematical LaTeX documents by advocating semantic markup through sTeX and proposing a semi-automatic -ification pipeline that (i) derives grammars from existing semantic macro definitions and (ii) parses formulas with these grammars. It combines GLR parsing via parglare with a GUI to disambiguate parse results, enabling conversion of parse trees into semantically annotated macros without imposing heavy learning requirements on authors. Demonstration on an untyped -term grammar illustrates how grammars can be augmented with types and precedence, and how standardised parse actions yield ASTs that map cleanly to semantic macros. The approach promises practical benefits like hyperlinks to definitions, interoperability with proof systems and computer algebra systems, and enhanced accessibility, while also highlighting limitations such as cyclic grammars and incomplete macro typings and outlining concrete future improvements.

Abstract

Mathematical documents written in LaTeX often contain ambiguities. We can resolve some of them via semantic markup using, e.g., sTeX, which also has other potential benefits, such as interoperability with computer algebra systems, proof systems, and increased accessibility. However, semantic markup is more involved than "regular" typesetting and presents a challenge for authors of mathematical documents. We aim to smooth out the transition from plain LaTeX to semantic markup by developing semi-automatic tools for authors. In this paper we present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing sTeX macro definitions and parsing mathematical formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its functionality and potential using a grammar for parsing untyped -terms.
Paper Structure (26 sections, 2 equations, 7 figures, 4 tables)

This paper contains 26 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The parse tree and AST for $(\abs{\var{x}}{\var{x}})$
  • Figure 2: The main GUI window, with enumerated sections
  • Figure 3: The GUI once a file has been opened
  • Figure 4: Visualising the AST for $\abs{\var{x}, \var{y}}{\app{\var{x}}{\var{y}}}$.
  • Figure 5: A grammar generated from the macros for the
  • ...and 2 more figures