Iterated Agent for Symbolic Regression
Zhuo-Yang Song, Zeyu Cai, Shutao Zhang, Jiashen Wei, Jichen Pan, Shi Qiu, Qing-Hong Cao, Tie-Jiun Hou, Xiaohui Liu, Ming-xing Luo, Hua Xing Zhu
TL;DR
This work tackles symbolic regression by moving search from pure syntax to semantics, using an iterated agent framework where Large Language Models generate semantically informed hypotheses guided by natural-language rationales. IdeaSearchFitter biases the search toward interpretable, physics-aligned expressions and operates within a multi-island evolutionary loop, balancing accuracy, complexity, and interpretability via a Pareto frontier. Empirical results on FSReD demonstrate strong noise robustness and competitive recovery rates, while real-world PMLB datasets reveal interpretable, mechanism-aligned models with favorable NMSE/complexity trade-offs. A frontier PDF case study shows compact, extrapolation-stable parametrizations that align with DGLAP evolution, underscoring the framework’s potential for physics-informed discovery and broader scientific applications.
Abstract
Symbolic regression (SR), the automated discovery of mathematical expressions from data, is a cornerstone of scientific inquiry. However, it is often hindered by the combinatorial explosion of the search space and a tendency to overfit. Popular methods, rooted in genetic programming, explore this space syntactically, often yielding overly complex, uninterpretable models. This paper introduces IdeaSearchFitter, a framework that employs Large Language Models (LLMs) as semantic operators within an evolutionary search. By generating candidate expressions guided by natural-language rationales, our method biases discovery towards models that are not only accurate but also conceptually coherent and interpretable. We demonstrate IdeaSearchFitter's efficacy across diverse challenges: it achieves competitive, noise-robust performance on the Feynman Symbolic Regression Database (FSReD), outperforming several strong baselines; discovers mechanistically aligned models with good accuracy-complexity trade-offs on real-world data; and derives compact, physically-motivated parametrizations for Parton Distribution Functions in a frontier high-energy physics application. IdeaSearchFitter is a specialized module within our broader iterated agent framework, IdeaSearch, which is publicly available at https://www.ideasearch.cn/.
