Table of Contents
Fetching ...

Multi-View Symbolic Regression

Etienne Russeil, Fabrício Olivetti de França, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Clément Michelin, Guillaume Moinard, Emmanuel Gangler

TL;DR

Multiview Symbolic Regression is presented, which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution that is robust to hyperparameters change.

Abstract

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.

Multi-View Symbolic Regression

TL;DR

Multiview Symbolic Regression is presented, which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution that is robust to hyperparameters change.

Abstract

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.
Paper Structure (12 sections, 5 equations, 5 figures, 4 tables)

This paper contains 12 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Heatmap of the median MSE of the tested combinations of noise and maximum expression size. Each row of heatmaps show results for the $f_1$, $f_1$ partial domains, $f_2$ and $f_3$ benchmarks, respectively. Columns represent the worst single-view (left), best single-view (center), and MvSR results (right). The colorbar represents the median MSE for that configuration, ranging from $0$ (white) to a clipped value of $5$ dark blue. The clipping improves the comparison of small values.
  • Figure 2: Critical difference diagram of the average rank w.r.t. the absolute difference between the number of parameters of a benchmark and the number of parameters of the model. This diagram was built calculating the Friedman hypothesis test with $\alpha=0.05$ and Holm–Bonferroni correction. The names ex1, ex2, ex3, ex4 refers to the four single-view models.
  • Figure 3: Best MvSR fit (Equation \ref{['eq:blb']}) of the absorption as a function of the molar concentration for 4 different molecules. Gray lines correspond to the Beer's law fitted to the data points for which $A \leq 1$.
  • Figure 4: Normalized (Section \ref{['sec:data_gen']}) distribution of returns for 2 assets, fitted by the Cauchy model and the best MvSR solution (Power-Laplace). The first asset is an example of a high MSE improvement in the usage of Power-Laplace compared to Cauchy.
  • Figure 5: Best fit of two parametric functions found by MvSR on SN2021mwb in the $g$ and $r$ filters. The right panel corresponds to the Bazin function commonly used in the literature.