Table of Contents
Fetching ...

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, Joanne T. Kim

TL;DR

This paper tackles the problem of symbolic regression by introducing Deep Symbolic Regression (DSR), a framework that uses a large autoregressive neural network to search the space of concise mathematical expressions and trains it with a risk-seeking policy gradient to optimize for best-case, high-quality solutions. Expressions are generated as pre-order traversals of expression trees, with in-situ constraints and optional constant optimization guiding the search, and rewards are computed via $R(\tau)=\frac{1}{1+NRMSE}$. Empirically, DSR outperforms multiple baselines, including GP-based methods and commercial tools, on the Nguyen benchmark suite, especially in exact symbolic recovery, while demonstrating robustness to noise and favorable runtimes with early stopping. The approach offers a general framework for optimizing hierarchical, variable-length objects under black-box metrics and suggests broad applicability to AutoML, program synthesis, and other domains where best-case performance is paramount.

Abstract

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic regression}$. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

TL;DR

This paper tackles the problem of symbolic regression by introducing Deep Symbolic Regression (DSR), a framework that uses a large autoregressive neural network to search the space of concise mathematical expressions and trains it with a risk-seeking policy gradient to optimize for best-case, high-quality solutions. Expressions are generated as pre-order traversals of expression trees, with in-situ constraints and optional constant optimization guiding the search, and rewards are computed via . Empirically, DSR outperforms multiple baselines, including GP-based methods and commercial tools, on the Nguyen benchmark suite, especially in exact symbolic recovery, while demonstrating robustness to noise and favorable runtimes with early stopping. The approach offers a general framework for optimizing hierarchical, variable-length objects under black-box metrics and suggests broad applicability to AutoML, program synthesis, and other domains where best-case performance is paramount.

Abstract

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of . Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.

Paper Structure

This paper contains 14 sections, 1 theorem, 19 equations, 9 figures, 11 tables, 3 algorithms.

Key Result

Proposition 1

Let $J_\textrm{risk}(\theta; \varepsilon)$ denote the conditional expectation of rewards above the $(1 - \varepsilon)$-quantile, as in Equation (eqn:J). Then the gradient of $J_\textrm{risk}(\theta; \varepsilon)$ is given by:

Figures (9)

  • Figure 1: A. Example of sampling an expression from the RNN. For each token, the RNN emits a categorical distribution over tokens, a token is sampled, and the parent and sibling of the next token are used as the next input to the RNN. Subsequent tokens are sampled autoregressively until the tree is complete (i.e. all tree branches reach terminal nodes). The resulting sequence of tokens is the tree's pre-order traversal, which can be used to reconstruct the tree and instantiate its corresponding expression. Colors correspond to the number of children for each token. White circles represent empty tokens. Numbers indicate the order in which tokens were sampled. B. The library of tokens. C. The expression tree sampled in A. In this example, the sampled expression is $\sin(cx)/\log(y)$, where the value of the constant $c$ is optimized with respect to an input dataset.
  • Figure 2: A - D. Empirical reward distributions for Nguyen-8. Each curve is a Gaussian kernel density estimate (bandwidth 0.25) of the rewards for a particular training iteration, using either the full batch of expressions (A and C) or the top $\varepsilon$ fraction of the batch (B and D), averaged over all training runs. Black plots (A and B) were trained using the risk-seeking policy gradient objective. Blue plots (C and D) were trained using the standard policy gradient objective. Colorbars indicate training step. Triangle markings denote the empirical mean of the distribution at the final training step. E. Training curves for mean reward of full batch (dotted), mean reward of top $\varepsilon$ fraction of the batch (dashed), and best expression found so far (solid), averaged over all training runs.
  • Figure 3: Recovery for various ablations of Algorithm \ref{['alg:dsr']} across all Nguyen benchmarks. Error bars represent standard error.
  • Figure 4: Recovery vs dataset noise and dataset size across all Nguyen benchmarks. Error bars represent standard error.
  • Figure 5: Recovery vs added reward noise on Nguyen-4. Error bars represent standard error.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof