Active Learning in Symbolic Regression with Physical Constraints
Jorge Medina, Andrew D. White
TL;DR
This work addresses data efficiency in symbolic regression by integrating active learning and soft physical constraints. The authors implement a Query by Committee strategy with a Pareto-frontier of candidate equations and embed physical knowledge as regularization terms, enabling rediscovery of known equations with far less data than traditional SR. Across gravity, Feynman benchmarks, robustness to noise, and a Shewanella growth case, the approach reduces data needs, improves interpretability, and yields physically meaningful relationships, including Gompertz-like growth parameterizations. The framework offers a practical, physics-informed pathway for data-efficient equation discovery with broad applicability and accessible code/data.
Abstract
Evolutionary symbolic regression (SR) fits a symbolic equation to data, which gives a concise interpretable model. We explore using SR as a method to propose which data to gather in an active learning setting with physical constraints. SR with active learning proposes which experiments to do next. Active learning is done with query by committee, where the Pareto frontier of equations is the committee. The physical constraints improve proposed equations in very low data settings. These approaches reduce the data required for SR and achieves state of the art results in data required to rediscover known equations.
