Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations

Krzysztof Kacprzyk; Mihaela van der Schaar

Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations

Krzysztof Kacprzyk, Mihaela van der Schaar

TL;DR

The paper tackles the limitation that symbolic regression often cannot yield compact, interpretable expressions for non-closed-form relationships while GAMs miss intricate interactions. It proposes SHAPE ARITHMETIC EXPRESSIONS (SHAREs), a unifying model class that combines GAM-like univariate shape functions with interaction-capable expression trees, under a rule-based transparency framework. The authors formalize SHAREs, establish theoretical properties on size and depth, and demonstrate via experiments (including a torque and a temperature problem) that SHAREs can outperform both SR and GAMs while preserving interpretability. This work advances AI4Science by enabling transparent, interaction-aware discovery of scientific relationships from data, with potential applicability across physics, biology, and engineering.

Abstract

Symbolic regression has excelled in uncovering equations from physics, chemistry, biology, and related disciplines. However, its effectiveness becomes less certain when applied to experimental data lacking inherent closed-form expressions. Empirically derived relationships, such as entire stress-strain curves, may defy concise closed-form representation, compelling us to explore more adaptive modeling approaches that balance flexibility with interpretability. In our pursuit, we turn to Generalized Additive Models (GAMs), a widely used class of models known for their versatility across various domains. Although GAMs can capture non-linear relationships between variables and targets, they cannot capture intricate feature interactions. In this work, we investigate both of these challenges and propose a novel class of models, Shape Arithmetic Expressions (SHAREs), that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions. SHAREs also provide a unifying framework for both of these approaches. We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints based on the model's size.

Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations

TL;DR

Abstract

Paper Structure (89 sections, 3 theorems, 14 equations, 10 figures, 15 tables, 1 algorithm)

This paper contains 89 sections, 3 theorems, 14 equations, 10 figures, 15 tables, 1 algorithm.

INTRODUCTION
Symbolic Regression
Generalized Additive Models
Transparency of Closed-Form Expressions
Contributions and Outline
LIMITATIONS OF CURRENT APPROACHES
Symbolic Regression Struggles With Expressions That Are Not Closed-Form.
GAMs Cannot Model Complex Interactions
Choice of Equations
Results
SHAPE ARITHMETIC EXPRESSIONS
Why Univariate Functions?
TRANSPARENCY
Understanding by Decomposing: Rule-based Transparency
Rationale for Rule 1
...and 74 more sections

Key Result

Proposition 1

Let $f:\mathbb{R}^n \rightarrow \mathbb{R}$ be a transparent SHARE. Then

Figures (10)

Figure 1: Stress-strain curves of aluminum at different temperatures
Figure 2: Shape Arithmetic Expression represented as a tree.
Figure 3: Plot of a function $s_1(x) = \frac{1+x}{\sqrt{1-x^2}}$.
Figure 4: Results of fitting SHAREs to the risk score data. Each row in the table shows the best found equation with the corresponding number of shape functions (#s). At the bottom, shape functions from the fourth row compared to the ground truth.
Figure 5: Equations found by fitting SHAREs to a torque equation $\tau = r F \sin(\theta)$. Each row in the table shows the best-found equation with the corresponding number of shape functions (#s). Bottom left panel: shape function from the second row compared to ground truth. Bottom right panel: shape functions from the fourth row.
...and 5 more figures

Theorems & Definitions (13)

Remark 1
Definition 1
Remark 2
Proposition 1
proof
Corollary 1
Definition 2: Active variables
Definition 3: Subtree of a node
proof
Lemma 1
...and 3 more

Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations

TL;DR

Abstract

Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (13)