Enhancing Symbolic Regression with Quality-Diversity and Physics-Inspired Constraints
J. -P. Bruneton
TL;DR
QDSR tackles the challenge of exact symbolic recovery in physics-informed SR by marrying GP with a quality-diversity MAP-Elites grid and a dimensionally aware engine. The approach expands the vocabulary with dimensionless combinations, norms, and scalar products, guided by dimensional analysis to simplify target expressions, and enforces dimensional consistency during evolution. Ablation studies show that QD alone already outperforms prior state-of-the-art, with dimensional analysis providing an additional uplift and vocabulary expansion solving otherwise difficult equations; on the Feynman-AI dataset, QDSR achieves about $91.6\%$ exact recovery, significantly surpassing existing SR methods. The results demonstrate robustness to noise and suggest that integrating QD-based diversity into SR can yield substantial performance gains without major system overhauls, with open-source code for reproducibility. The work advances physics-informed SR by enabling scalable, interpretable symbolic recovery and providing a blueprint for incorporating dimensional constraints into GP-based search.
Abstract
This paper presents QDSR, an advanced symbolic Regression (SR) system that integrates genetic programming (GP), a quality-diversity (QD) algorithm, and a dimensional analysis (DA) engine. Our method focuses on exact symbolic recovery of known expressions from datasets, with a particular emphasis on the Feynman-AI benchmark. On this widely used collection of 117 physics equations, QDSR achieves an exact recovery rate of 91.6~$\%$, surpassing all previous SR methods by over 20 percentage points. Our method also exhibits strong robustness to noise. Beyond QD and DA, this high success rate results from a profitable trade-off between vocabulary expressiveness and search space size: we show that significantly expanding the vocabulary with precomputed meaningful variables (e.g., dimensionless combinations and well-chosen scalar products) often reduces equation complexity, ultimately leading to better performance. Ablation studies will also show that QD alone already outperforms the state-of-the-art. This suggests that a simple integration of QD, by projecting individuals onto a QD grid, can significantly boost performance in existing algorithms, without requiring major system overhauls.
