Table of Contents
Fetching ...

Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs

Rashid Barket, Matthew England, Jürgen Gerhard

TL;DR

The paper tackles selecting Maple's $12$ sub-algorithms for symbolic indefinite integration to minimize the output expression length, framing the choice as a multi-label prediction problem. It compares a sequence-based LSTM and a tree-structured TreeLSTM that encodes expression DAGs against Maple's meta-algorithm, finding that TreeLSTM substantially outperforms the others on a held-out test set. A data-generation framework using five methods yields $100{,}000$ labeled integrands, with validation on independent data (Maple test suite) showing the TreeLSTM generalizes beyond training data. The work demonstrates that tree-based representations of mathematics improve sub-algorithm selection and suggests practical integration into Maple after further data and hyperparameter optimization.

Abstract

Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.

Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs

TL;DR

The paper tackles selecting Maple's sub-algorithms for symbolic indefinite integration to minimize the output expression length, framing the choice as a multi-label prediction problem. It compares a sequence-based LSTM and a tree-structured TreeLSTM that encodes expression DAGs against Maple's meta-algorithm, finding that TreeLSTM substantially outperforms the others on a held-out test set. A data-generation framework using five methods yields labeled integrands, with validation on independent data (Maple test suite) showing the TreeLSTM generalizes beyond training data. The work demonstrates that tree-based representations of mathematics improve sub-algorithm selection and suggests practical integration into Maple after further data and hyperparameter optimization.

Abstract

Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.
Paper Structure (19 sections, 1 theorem, 1 equation, 5 figures)

This paper contains 19 sections, 1 theorem, 1 equation, 5 figures.

Key Result

theorem thmcountertheorem

If $u = g(x)$ is a differentiable function whose range is an interval $I$ and $f$ is continuous on $I$, then

Figures (5)

  • Figure 1: The output of $\int x\sin(x)$ from three successful sub-algorithm. The optimal output is the shortest expression from the second sub-algorithm.
  • Figure 2: Visual representation of an LSTM (left) and TreeLSTM (right) Tai2015_TreeLSTM.
  • Figure 3: Optimal sub-algorithms for $100$k integrands.
  • Figure 4: The number of times each ML model and Maple's meta-algorithm produced the optimal answer, or came close to it, on the testing dataset.
  • Figure 5: The number of times each ML model and Maple’s meta-algorithm produced the optimal answer, or came close to it, on the Maple Test Suite.

Theorems & Definitions (1)

  • theorem thmcountertheorem: Substitution Rule