Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs
Rashid Barket, Matthew England, Jürgen Gerhard
TL;DR
The paper tackles selecting Maple's $12$ sub-algorithms for symbolic indefinite integration to minimize the output expression length, framing the choice as a multi-label prediction problem. It compares a sequence-based LSTM and a tree-structured TreeLSTM that encodes expression DAGs against Maple's meta-algorithm, finding that TreeLSTM substantially outperforms the others on a held-out test set. A data-generation framework using five methods yields $100{,}000$ labeled integrands, with validation on independent data (Maple test suite) showing the TreeLSTM generalizes beyond training data. The work demonstrates that tree-based representations of mathematics improve sub-algorithm selection and suggests practical integration into Maple after further data and hyperparameter optimization.
Abstract
Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.
