AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

Mert Ünsal; Timon Gehr; Martin Vechev

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

Mert Ünsal, Timon Gehr, Martin Vechev

TL;DR

This work introduces a symbolic engine with axiomatically correct actions on mathematical expressions, as well as the first dataset for step-by-step integration, and presents the first correct-by-construction learning-based system for step-by-step mathematical integration.

Abstract

We present the first correct-by-construction learning-based system for step-by-step mathematical integration. The key idea is to learn a policy, represented by a GPT transformer model, which guides the search for the right mathematical integration rule, to be carried out by a symbolic solver. Concretely, we introduce a symbolic engine with axiomatically correct actions on mathematical expressions, as well as the first dataset for step-by-step integration. Our GPT-style transformer model, trained on this synthetic data, demonstrates strong generalization by surpassing its own data generator in accuracy and efficiency, using 50% fewer search steps. Our experimental results with SoTA LLMs also demonstrate that the standard approach of fine-tuning LLMs on a set of question-answer pairs is insufficient for solving this mathematical task. This motivates the importance of discovering creative methods for combining LLMs with symbolic reasoning engines, of which our work is an instance.

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

TL;DR

Abstract

Paper Structure (35 sections, 2 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 2 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Mathematical Integration
Our Work: correct-by-construction learning-based integration
Main contributions
Symbolic Engine with Parametric Action Space
Representation of Mathematical Expressions and Theorems
Tree to Sequence Equivalence and Parsing
Tree to Sequence
Sequence to Tree
Tokenization
Generating A Synthetic Mathematical Integration Dataset
Random Mathematical Expressions
Steps of Integration
Data Augmentation with Integration By Parts
Model Architecture and Training Objective
...and 20 more sections

Figures (7)

Figure 1: The symbolic engine takes in the subexpression $\int x(4x + \log x - 2) dx$ and applies the substitution rule with $y = \log x$. The symbolic engine will realize the rule by first differentiating $y(x)$ and dividing the integrand by $y'$. Then, it will substitute by $y$ wherever it observes $y(x)$ and then solve for $x = g(y)$ to substitute the remaining terms.
Figure 2: Tree representations of the expressions $\frac{1}{x+3} + 2\cosh^2(x)$ and $\int x^2 e^x dx$
Figure 3: Inference Loop for Integration
Figure 4: Step-by-step solution generated by AlphaIntegrator.
Figure 5: Distribution of number of nodes explored in tree search for AlphaIntegrator vs SymPy.
...and 2 more figures

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

TL;DR

Abstract

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

Authors

TL;DR

Abstract

Table of Contents

Figures (7)