Table of Contents
Fetching ...

LLM-Meta-SR: In-Context Learning for Evolving Selection Operators in Symbolic Regression

Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

Abstract

Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where algorithms automatically discover symbolic expressions from data, remains limited. In this paper, we propose a meta-learning framework that enables LLMs to automatically design selection operators for evolutionary symbolic regression algorithms. We first identify two key limitations in existing LLM-based algorithm evolution techniques: lack of semantic guidance and code bloat. The absence of semantic awareness can lead to ineffective exchange of useful code components, while bloat results in unnecessarily complex components; both can hinder evolutionary learning progress or reduce the interpretability of the designed algorithm. To address these issues, we enhance the LLM-based evolution framework for meta-symbolic regression with two key innovations: a complementary, semantics-aware selection operator and bloat control. Additionally, we embed domain knowledge into the prompt, enabling the LLM to generate more effective and contextually relevant selection operators. Our experimental results on symbolic regression benchmarks show that LLMs can devise selection operators that outperform nine expert-designed baselines, achieving state-of-the-art performance. Moreover, the evolved operator can further improve a state-of-the-art symbolic regression algorithm, achieving the best performance among 28 symbolic regression and other machine learning algorithms across 116 regression datasets. This demonstrates that LLMs can exceed expert-level algorithm design for symbolic regression.

LLM-Meta-SR: In-Context Learning for Evolving Selection Operators in Symbolic Regression

Abstract

Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where algorithms automatically discover symbolic expressions from data, remains limited. In this paper, we propose a meta-learning framework that enables LLMs to automatically design selection operators for evolutionary symbolic regression algorithms. We first identify two key limitations in existing LLM-based algorithm evolution techniques: lack of semantic guidance and code bloat. The absence of semantic awareness can lead to ineffective exchange of useful code components, while bloat results in unnecessarily complex components; both can hinder evolutionary learning progress or reduce the interpretability of the designed algorithm. To address these issues, we enhance the LLM-based evolution framework for meta-symbolic regression with two key innovations: a complementary, semantics-aware selection operator and bloat control. Additionally, we embed domain knowledge into the prompt, enabling the LLM to generate more effective and contextually relevant selection operators. Our experimental results on symbolic regression benchmarks show that LLMs can devise selection operators that outperform nine expert-designed baselines, achieving state-of-the-art performance. Moreover, the evolved operator can further improve a state-of-the-art symbolic regression algorithm, achieving the best performance among 28 symbolic regression and other machine learning algorithms across 116 regression datasets. This demonstrates that LLMs can exceed expert-level algorithm design for symbolic regression.

Paper Structure

This paper contains 45 sections, 5 equations, 32 figures, 13 tables, 3 algorithms.

Figures (32)

  • Figure 1: Illustration of fine-grained semantic differences between algorithms during meta-evolution. P1 and P2 are parent selection operator algorithms, O is an offspring algorithm, and D is a dataset.
  • Figure 2: Average code length of solutions in the population over generations without bloat control techniques, and the score of the best solution.
  • Figure 3: Workflow of LLM-driven selection operator evolution. The right-hand side shows the outer meta-evolution loop that generates candidate selection operators, while the left-hand side shows the inner SR loop that uses each candidate selection operator to evolve symbolic expressions and evaluates its performance.
  • Figure 4: Comparison across generations for different LLM-driven search strategies using GPT-4.1-Mini.
  • Figure 5: t-SNE visualization of evolved operator semantics. The shape of each point indicates whether it achieved top-3 performance on any task: stars denote top-3 performance on at least one task, while circles indicate otherwise.
  • ...and 27 more figures