Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems
Philipp Anthes, Dominik Sobania, Franz Rothlauf
TL;DR
This paper tackles symbolic regression by introducing Transformer Semantic Genetic Programming (TSGP), a semantic-search framework that uses a pre-trained transformer as a zero-shot variation operator to produce offspring with controlled semantic similarity to a parent. Unlike online neural-guided methods, TSGP builds a semantic model offline that generalizes across problem dimensions $d$ and then applies learned semantic transformations during search, conditioned on a target semantic distance $\mathrm{SD}_t$. Across 24 real-world and synthetic SR benchmarks, TSGP achieves an average rank of $1.58$, outperforming stdGP, SLIM_GSGP, Deep Symbolic Regression, and DAE-GP, while generating more compact solutions than SLIM_GSGP. The results show that smaller $\mathrm{SD}_t$ yields slower but more precise improvements (exploitation), whereas larger $\mathrm{SD}_t$ promotes faster convergence and smaller programs (exploration), enabling a principled balance between search dynamics. The work highlights the practical potential of offline transformer-based semantic variation for robust, scalable SR across diverse dimensionalities.
Abstract
Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance $\mathrm{SD}_t$ is able to control the step size in the semantic space: small values of $\mathrm{SD}_t$ enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, $\mathrm{SD}_t$ provides an effective mechanism for balancing exploration and exploitation.
