Table of Contents
Fetching ...

Unit-Aware Genetic Programming for the Development of Empirical Equations

Julia Reuter, Viktor Martinek, Roland Herzog, Sanaz Mostaghim

TL;DR

The paper tackles the challenge of discovering empirical equations when constants have unknown units, by extending unit-aware symbolic regression with a dimensional-analysis framework that propagates joker units through GP trees. It introduces three constraint-handling methods—evolutive culling, a repair mechanism, and a multi-objective approach—embedded in a GP framework to enforce unit adherence while allowing unknown constants. Across benchmark datasets with and without ground-truth solutions, the unit-aware methods achieve competitive accuracy and, in many cases, produce unit-adherent equations, with the multi-objective approach offering rich Pareto fronts. The work demonstrates the practical viability of integrating dimensional analysis into GP for physics-informed equation discovery and outlines directions for more complex benchmarks and deeper population-dynamics analyses.

Abstract

When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.

Unit-Aware Genetic Programming for the Development of Empirical Equations

TL;DR

The paper tackles the challenge of discovering empirical equations when constants have unknown units, by extending unit-aware symbolic regression with a dimensional-analysis framework that propagates joker units through GP trees. It introduces three constraint-handling methods—evolutive culling, a repair mechanism, and a multi-objective approach—embedded in a GP framework to enforce unit adherence while allowing unknown constants. Across benchmark datasets with and without ground-truth solutions, the unit-aware methods achieve competitive accuracy and, in many cases, produce unit-adherent equations, with the multi-objective approach offering rich Pareto fronts. The work demonstrates the practical viability of integrating dimensional analysis into GP for physics-informed equation discovery and outlines directions for more complex benchmarks and deeper population-dynamics analyses.

Abstract

When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.
Paper Structure (19 sections, 2 figures, 4 tables)

This paper contains 19 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Measurements on the Pareto-optimal front for datasets with unknown solutions from thermodynamics and fluid mechanics over 31 independent runs.
  • Figure 2: Solutions of 31 combined PO fronts per algorithm on the TD dataset. The magnitude of unit violations is color-coded from white (0 violations) to black (22 violations), with 22 being the maximum number of unit violations on the TD dataset.