Table of Contents
Fetching ...

Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems

Philipp Anthes, Dominik Sobania, Franz Rothlauf

TL;DR

This paper tackles symbolic regression by introducing Transformer Semantic Genetic Programming (TSGP), a semantic-search framework that uses a pre-trained transformer as a zero-shot variation operator to produce offspring with controlled semantic similarity to a parent. Unlike online neural-guided methods, TSGP builds a semantic model offline that generalizes across problem dimensions $d$ and then applies learned semantic transformations during search, conditioned on a target semantic distance $\mathrm{SD}_t$. Across 24 real-world and synthetic SR benchmarks, TSGP achieves an average rank of $1.58$, outperforming stdGP, SLIM_GSGP, Deep Symbolic Regression, and DAE-GP, while generating more compact solutions than SLIM_GSGP. The results show that smaller $\mathrm{SD}_t$ yields slower but more precise improvements (exploitation), whereas larger $\mathrm{SD}_t$ promotes faster convergence and smaller programs (exploration), enabling a principled balance between search dynamics. The work highlights the practical potential of offline transformer-based semantic variation for robust, scalable SR across diverse dimensionalities.

Abstract

Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance $\mathrm{SD}_t$ is able to control the step size in the semantic space: small values of $\mathrm{SD}_t$ enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, $\mathrm{SD}_t$ provides an effective mechanism for balancing exploration and exploitation.

Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems

TL;DR

This paper tackles symbolic regression by introducing Transformer Semantic Genetic Programming (TSGP), a semantic-search framework that uses a pre-trained transformer as a zero-shot variation operator to produce offspring with controlled semantic similarity to a parent. Unlike online neural-guided methods, TSGP builds a semantic model offline that generalizes across problem dimensions and then applies learned semantic transformations during search, conditioned on a target semantic distance . Across 24 real-world and synthetic SR benchmarks, TSGP achieves an average rank of , outperforming stdGP, SLIM_GSGP, Deep Symbolic Regression, and DAE-GP, while generating more compact solutions than SLIM_GSGP. The results show that smaller yields slower but more precise improvements (exploitation), whereas larger promotes faster convergence and smaller programs (exploration), enabling a principled balance between search dynamics. The work highlights the practical potential of offline transformer-based semantic variation for robust, scalable SR across diverse dimensionalities.

Abstract

Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance is able to control the step size in the semantic space: small values of enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, provides an effective mechanism for balancing exploration and exploitation.

Paper Structure

This paper contains 25 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Model Building of TSGP: (1) Diverse functions are generated and their semantics are approximated; (2) Semantically similar pairs are identified through a $k$-NN search in the semantic space; (3) These pairs are used as input-output examples to train a transformer model, conditioned on their semantic distance $\mathrm{SD}$ and the problem dimensionality $d$.
  • Figure 2: Median training RMSE of the best programs (solutions) of TSGP, stdGP, SLIM_GSGP over generations on a subset of the analyzed datasets.
  • Figure 3: Median size of the solutions of TSGP, stdGP, SLIM_GSGP over generations on a subset of the analyzed datasets.
  • Figure 4: Median Euclidean distance between the semantics $s(f_i)$ and $s(f_o)$ for TSGP with varying $\mathrm{SD}_t$, stdGP, SLIM_GSGP over the number of generations on a subset of the analyzed datasets.
  • Figure 5: Median number of generations without improving the training RMSE over the number of generations. Results are for TSGP with varying $\mathrm{SD}_t$, stdGP, and SLIM_GSGP.
  • ...and 4 more figures