Graph-Eq: Discovering Mathematical Equations using Graph Generative Models
Nisal Ranasinghe, Damith Senanayake, Saman Halgamuge
TL;DR
Graph-Eq introduces a graph-based equation discovery framework by encoding mathematical expressions as directed acyclic graphs and learning a structured latent space with a graph conditional variational autoencoder. The model conditions the encoder/decoder on dataset embeddings that capture functional properties, enabling Bayesian optimization to efficiently search for equations that minimize $\text{MSE}$ on target data. Empirical results show improved latent-space quality and a higher ground-truth recovery rate (11/20) for conditional VAEs compared to a vanilla DVAE (8/20), demonstrating the value of functional conditioning. A key limitation is the current inability to represent numerical constants within the DAG representation, pointing to future work on extending the graph structure to include constants and constants-aware operators.
Abstract
The ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to overfitting. By representing equations as directed acyclic graphs, we leverage the use of graph neural networks to learn the underlying semantics of equations, and generate new, previously unseen equations. Although graph generative models have been shown to be successful in discovering new types of graphs in many fields, there application in discovering equations remains largely unexplored. In this work, we propose Graph-EQ, a deep graph generative model designed for efficient equation discovery. Graph-EQ uses a conditional variational autoencoder (CVAE) to learn a rich latent representation of the equation space by training it on a large corpus of equations in an unsupervised manner. Instead of directly searching the equation space, we employ Bayesian optimization to efficiently explore this learned latent space. We show that the encoder-decoder architecture of Graph-Eq is able to accurately reconstruct input equations. Moreover, we show that the learned latent representation can be sampled and decoded into valid equations, including new and previously unseen equations in the training data. Finally, we assess Graph-Eq's ability to discover equations that best fit a dataset by exploring the latent space using Bayesian optimization. Latent space exploration is done on 20 dataset with known ground-truth equations, and Graph-Eq is shown to successfully discover the grountruth equation in the majority of datasets.
