Table of Contents
Fetching ...

Optimal spanning tree reconstruction in symbolic regression

Radoslav G. Neychev, Innokentiy A. Shibaev, Vadim V. Strijov

TL;DR

This work reframes symbolic regression as recovering an optimal superposition tree from a probabilistic, colored graph where edge weights encode the probability of combining primitives. It connects the model-reconstruction task to minimum-spanning-tree style problems, notably k-MST and prize-collecting Steiner tree (PCST) frameworks, and develops LP-based and approximation approaches to handle arity constraints and directed edges. The authors compare several reconstruction strategies, with Prim’s algorithm delivering the strongest performance and suggesting PCST-based methods as viable baselines. The results illuminate a principled, graph-theoretic pathway for structural learning in symbolic regression, with practical implications for efficiently recovering admissible model structures under uncertainty and noise.

Abstract

This paper investigates the problem of regression model generation. A model is a superposition of primitive functions. The model structure is described by a weighted colored graph. Each graph vertex corresponds to some primitive function. An edge assigns a superposition of two functions. The weight of an edge equals the probability of superposition. To generate an optimal model one has to reconstruct its structure from its graph adjacency matrix. The proposed algorithm reconstructs the~minimum spanning tree from the~weighted colored graph. This paper presents a novel solution based on the prize-collecting Steiner tree algorithm. This algorithm is compared with its alternatives.

Optimal spanning tree reconstruction in symbolic regression

TL;DR

This work reframes symbolic regression as recovering an optimal superposition tree from a probabilistic, colored graph where edge weights encode the probability of combining primitives. It connects the model-reconstruction task to minimum-spanning-tree style problems, notably k-MST and prize-collecting Steiner tree (PCST) frameworks, and develops LP-based and approximation approaches to handle arity constraints and directed edges. The authors compare several reconstruction strategies, with Prim’s algorithm delivering the strongest performance and suggesting PCST-based methods as viable baselines. The results illuminate a principled, graph-theoretic pathway for structural learning in symbolic regression, with practical implications for efficiently recovering admissible model structures under uncertainty and noise.

Abstract

This paper investigates the problem of regression model generation. A model is a superposition of primitive functions. The model structure is described by a weighted colored graph. Each graph vertex corresponds to some primitive function. An edge assigns a superposition of two functions. The weight of an edge equals the probability of superposition. To generate an optimal model one has to reconstruct its structure from its graph adjacency matrix. The proposed algorithm reconstructs the~minimum spanning tree from the~weighted colored graph. This paper presents a novel solution based on the prize-collecting Steiner tree algorithm. This algorithm is compared with its alternatives.

Paper Structure

This paper contains 11 sections, 5 theorems, 22 equations, 2 figures, 2 tables, 4 algorithms.

Key Result

Lemma 1

Let $B\subseteq S\subset V$. Then $f(S) = 0$ and $f(B) = 0$ leads to $f(S\backslash B) = 0$.

Figures (2)

  • Figure 1: The regression model structure is a directed graph
  • Figure 2: Quality of the reconstruction algorithms with primitive functions of small arities and unordered inputs

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Definition 3
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Theorem 2
  • Proof 1: to lemma \ref{['lem1']}
  • Proof 2: to lemma \ref{['lem2']}
  • ...and 3 more