Table of Contents
Fetching ...

Graph-Eq: Discovering Mathematical Equations using Graph Generative Models

Nisal Ranasinghe, Damith Senanayake, Saman Halgamuge

TL;DR

Graph-Eq introduces a graph-based equation discovery framework by encoding mathematical expressions as directed acyclic graphs and learning a structured latent space with a graph conditional variational autoencoder. The model conditions the encoder/decoder on dataset embeddings that capture functional properties, enabling Bayesian optimization to efficiently search for equations that minimize $\text{MSE}$ on target data. Empirical results show improved latent-space quality and a higher ground-truth recovery rate (11/20) for conditional VAEs compared to a vanilla DVAE (8/20), demonstrating the value of functional conditioning. A key limitation is the current inability to represent numerical constants within the DAG representation, pointing to future work on extending the graph structure to include constants and constants-aware operators.

Abstract

The ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to overfitting. By representing equations as directed acyclic graphs, we leverage the use of graph neural networks to learn the underlying semantics of equations, and generate new, previously unseen equations. Although graph generative models have been shown to be successful in discovering new types of graphs in many fields, there application in discovering equations remains largely unexplored. In this work, we propose Graph-EQ, a deep graph generative model designed for efficient equation discovery. Graph-EQ uses a conditional variational autoencoder (CVAE) to learn a rich latent representation of the equation space by training it on a large corpus of equations in an unsupervised manner. Instead of directly searching the equation space, we employ Bayesian optimization to efficiently explore this learned latent space. We show that the encoder-decoder architecture of Graph-Eq is able to accurately reconstruct input equations. Moreover, we show that the learned latent representation can be sampled and decoded into valid equations, including new and previously unseen equations in the training data. Finally, we assess Graph-Eq's ability to discover equations that best fit a dataset by exploring the latent space using Bayesian optimization. Latent space exploration is done on 20 dataset with known ground-truth equations, and Graph-Eq is shown to successfully discover the grountruth equation in the majority of datasets.

Graph-Eq: Discovering Mathematical Equations using Graph Generative Models

TL;DR

Graph-Eq introduces a graph-based equation discovery framework by encoding mathematical expressions as directed acyclic graphs and learning a structured latent space with a graph conditional variational autoencoder. The model conditions the encoder/decoder on dataset embeddings that capture functional properties, enabling Bayesian optimization to efficiently search for equations that minimize on target data. Empirical results show improved latent-space quality and a higher ground-truth recovery rate (11/20) for conditional VAEs compared to a vanilla DVAE (8/20), demonstrating the value of functional conditioning. A key limitation is the current inability to represent numerical constants within the DAG representation, pointing to future work on extending the graph structure to include constants and constants-aware operators.

Abstract

The ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to overfitting. By representing equations as directed acyclic graphs, we leverage the use of graph neural networks to learn the underlying semantics of equations, and generate new, previously unseen equations. Although graph generative models have been shown to be successful in discovering new types of graphs in many fields, there application in discovering equations remains largely unexplored. In this work, we propose Graph-EQ, a deep graph generative model designed for efficient equation discovery. Graph-EQ uses a conditional variational autoencoder (CVAE) to learn a rich latent representation of the equation space by training it on a large corpus of equations in an unsupervised manner. Instead of directly searching the equation space, we employ Bayesian optimization to efficiently explore this learned latent space. We show that the encoder-decoder architecture of Graph-Eq is able to accurately reconstruct input equations. Moreover, we show that the learned latent representation can be sampled and decoded into valid equations, including new and previously unseen equations in the training data. Finally, we assess Graph-Eq's ability to discover equations that best fit a dataset by exploring the latent space using Bayesian optimization. Latent space exploration is done on 20 dataset with known ground-truth equations, and Graph-Eq is shown to successfully discover the grountruth equation in the majority of datasets.

Paper Structure

This paper contains 14 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The architecture of Graph-Eq. The random equation generator generates equations for training. These equations are represented as equations DAGs and parallely used to create a small dataset of $\textbf{x}, y$ pairs. This dataset is then used to calculate a dataset embedding to be used to condition the VAE encoder and decoder. The VAE decode
  • Figure 2: An example DAG representation of an equation. The intermediate nodes represent operators while the source and sink nodes represent inputs and outputs respectively.
  • Figure 3: The equation discovery pipeline of Graph-Eq. Bayesian optimization is used to efficiently sample points in the latent space until an optimal equation is discovered.
  • Figure 4: A 2D subspace of the latent space learnt by Graph-Eq is visualized to demonstrate the smoothness of the latent space for a single SR dataset. Each point in this subspace is decoded into an equation DAG, and the $1/(1 + MSE)$ score is visualized in the colourmap.