Table of Contents
Fetching ...

Symbolic Equation Solving via Reinforcement Learning

Lennart Dabelow, Masahito Ueda

TL;DR

A novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator to explore mathematical relations and demonstrates how the reinforcement-learning agent autonomously discovers elementary transformation rules and step-by-step solutions.

Abstract

Machine-learning methods are gradually being adopted in a wide variety of social, economic, and scientific contexts, yet they are notorious for struggling with exact mathematics. A typical example is computer algebra, which includes tasks like simplifying mathematical terms, calculating formal derivatives, or finding exact solutions of algebraic equations. Traditional software packages for these purposes are commonly based on a huge database of rules for how a specific operation (e.g., differentiation) transforms a certain term (e.g., sine function) into another one (e.g., cosine function). These rules have usually needed to be discovered and subsequently programmed by humans. Efforts to automate this process by machine-learning approaches are faced with challenges like the singular nature of solutions to mathematical problems, when approximations are unacceptable, as well as hallucination effects leading to flawed reasoning. We propose a novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator to explore mathematical relations. By construction, this system is capable of exact transformations and immune to hallucination. Using the paradigmatic example of solving linear equations in symbolic form, we demonstrate how our reinforcement-learning agent autonomously discovers elementary transformation rules and step-by-step solutions.

Symbolic Equation Solving via Reinforcement Learning

TL;DR

A novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator to explore mathematical relations and demonstrates how the reinforcement-learning agent autonomously discovers elementary transformation rules and step-by-step solutions.

Abstract

Machine-learning methods are gradually being adopted in a wide variety of social, economic, and scientific contexts, yet they are notorious for struggling with exact mathematics. A typical example is computer algebra, which includes tasks like simplifying mathematical terms, calculating formal derivatives, or finding exact solutions of algebraic equations. Traditional software packages for these purposes are commonly based on a huge database of rules for how a specific operation (e.g., differentiation) transforms a certain term (e.g., sine function) into another one (e.g., cosine function). These rules have usually needed to be discovered and subsequently programmed by humans. Efforts to automate this process by machine-learning approaches are faced with challenges like the singular nature of solutions to mathematical problems, when approximations are unacceptable, as well as hallucination effects leading to flawed reasoning. We propose a novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator to explore mathematical relations. By construction, this system is capable of exact transformations and immune to hallucination. Using the paradigmatic example of solving linear equations in symbolic form, we demonstrate how our reinforcement-learning agent autonomously discovers elementary transformation rules and step-by-step solutions.
Paper Structure (24 sections, 16 equations, 7 figures, 16 tables)

This paper contains 24 sections, 16 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: Schematic illustration of the reinforcement-learning approach. (a) An agent interacts with its environment and is supposed to solve a specific task. (b) Learning proceeds via a feedback loop: The agent makes an observation $s_t = v(q_t)$ of the state $q_t$ of the environment and decides on its next action $a_t$ according to the policy $\pi(a_t \,|\, s_t)$ (conditional probability distribution). The environment responds to the action by changing its state and issuing a reward $r_t$ that reflects how close the new state is to the desired goal. The agent updates its policy with the aim of maximizing the accumulated rewards.
  • Figure 2: Main components of the reinforcement-learning framework for symbolic equation solving. (a) The state comprises the left- and right-hand sides of the equation as well as a stack of up to $S$ additional terms. Every term consists of (up to) $T$ elementary units (i.e., operators, variables, numbers, parentheses). (b) The agent can choose from four classes of actions: copy (sub)terms from the equation to the stack ($2T$ actions); perform an equivalence transformation of the equation ($O_{\mathrm{eq}}$); push a predefined numerical constant to the stack ($C_{\mathrm{num}}$); or apply a mathematical operation on the stack ($O_{\mathrm{st}}$). (c) Examples of how the state changes under each of the four classes of actions.
  • Figure 3: Test success and solution strategies for equations of type (\ref{['eq:Problem:LinEqNumCoeff']}) with (a) real-valued coefficients or (b--d) complex-valued coefficients. (a, b) Test success and average number of elementary steps to success vs. training time $\tau$ for various test data sets (see column heading) and maximum number of steps $t_{\max} = 100$. Average number of steps (bottom row) only shown if test success is $\geq 2\,\%$. (a) Solid (R1): training with integer $a_i$; dashed (R2): training with rational $a_i$. (b) Solid (C1): training with real-integer $a_i$; dashed (C2): training with complex-integer $a_i$. (c) Summary graph for the solution strategy of (C2) after $\tau = 1.955 \!\times\! 10^7$ training epochs, analyzed for the complex-rational test data set. States are collected into superstates according to the form of the equation, disregarding left-right symmetry and the stack. Disk sectors and numbers (in purple) indicate the relative number of elementary steps spent in the respective superstates (in $\%$, only shown if $\geq 1\,\%$). Transitions between superstates are indicated by arrows, with their relative frequencies compared to all transitions from a fixed state given by the numbers in red (in $\%$, only shown if $\geq 1\,\%$). Special states: timeout: SymPy processing of the selected action exceeds $10\,\mathrm{s}$; bad: state cannot be represented as a neural-network tensor (e.g., term size exceeds $T$, numerical overflow). (d) Example sequence of elementary transformations suggested by the network from (c) to solve $-\frac{1}{5} + \frac{3}{4} x = \frac{5}{8} + 2x$.
  • Figure 4: Test success and average number of steps for equations of type (\ref{['eq:Problem:LinEqSymCoeff']}) with mixed numerical (real-valued) and symbolic coefficients. Test types: see column headings; maximum number of steps: $t_{\max} = 100$. Solid (S1): training with $a_i, b_i \in \mathbb{Z}$ and $a_0 = b_0 = a_3 = b_3 = 0$ or $a = 1 = b_1 = a_2 = b_2 = 0$, $p_0 = 0$, learning rate $\eta = 0.05$; dotted (S2): training with $a_i, b_i \in \mathbb{Z}$, $p_0 = 1/2$, $\eta = 0.05$; dash-dotted (S3): training with $a_i, b_i \in \mathbb{Z}$, $p_0 = 2/3$, $\eta = 0.01$; dashed (S4): training with $a_i, b_i \in \mathbb{Q}$, $p_0 = 2/3$, $\eta = 0.05$; dash-dot-dotted (S5): training with $a_i, b_i \in \mathbb{Q}$, $p_0 = 2/3$, $\eta = 0.01$. Average number of steps for successful transformations (bottom row) only shown if test success is $\geq 2\,\%$.
  • Figure 5: Characteristics of the adversarial reinforcement-learning approach. (a) Test success and average number of steps for equations with purely numerical coefficients (test types: see column headings). Generator initialization: $x = a \in \mathbb{Q}$ (see also Appendix \ref{['app:Implementation:TaskSampling']}). (b) Histograms of the equation patterns produced by the generator (blue) and the fraction of successfully recovered solutions by the solver (orange). (c) Test success and average number of steps for equations with mixed symbolic and numerical coefficients (test types: see column headings). Solid (AS1): adversarial training, generator initialization $x = (a_1 + b_1 c) / (a_2 + b_2 c)$ with $a_i, b_i \in \mathbb{Q}$, $p_0 = 1/2$; dashed (AS2): fixed-distribution training with equation type (\ref{['eq:Problem:LinEqSymCoeff']}), $a_i, b_i \in \mathbb{Q}$, $p_0 = 2/3$. Learning rates and greediness are adapted according to online estimates of the learning progress, see Appendix \ref{['app:Implementation:Schedules']} for details. Average number of steps for successful transformations (bottom row in (a) and (c)) only shown if test success is $\geq 2\,\%$.
  • ...and 2 more figures