Class Symbolic Regression: Gotta Fit 'Em All
Wassim Tenachi, Rodrigo Ibata, Thibaut L. François, Foivos I. Diakogiannis
TL;DR
Class Symbolic Regression (Class SR) addresses the problem of discovering a single analytic form that simultaneously fits multiple related datasets by allowing dataset-specific parameters while sharing class-wide parameters. Built on the Phi-SO framework, it combines dimensional analysis constraints with deep reinforcement learning to search for universal governing laws, and optimizes expressions with an LBFGS-based fitting over realizations, guided by a reward derived from the normalized RMSE. The authors demonstrate the approach on a first Class SR benchmark of eight physics-inspired problems and on an astrophysical application to recover a Milky Way–like NFW potential from stellar stream data, showing superior exact symbolic recovery and robustness to noise compared to traditional single-dataset SR. This work advances interpretable, physics-informed symbolic discovery in multi-dataset settings and offers practical tools for extracting universal laws in complex scientific domains. Key contributions include: (i) introducing Class SR as a hierarchical extension of Phi-SO for multi-dataset symbolic regression; (ii) defining a concrete optimization-and-RL loop that jointly tunes class and realization-specific parameters; (iii) creating a first Class SR benchmark and demonstrating improved performance, especially under measurement noise; (iv) validating the method with an astrophysical example that yields a concise analytic potential from stellar streams.
Abstract
We introduce 'Class Symbolic Regression' (Class SR) a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each realization being governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($Φ$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for unsupervised symbolic analytical function discovery from data. Additionally, we introduce the first Class SR benchmark, comprising a series of synthetic physical challenges specifically designed to evaluate such algorithms. We demonstrate the efficacy of our novel approach by applying it to these benchmark challenges and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.
