Table of Contents
Fetching ...

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

Francesco Sovrano, Lidia Losavio, Giulia Vilone, Marc Langheinrich

Abstract

Symbolic regression aims to replace black-box predictors with concise analytical expressions that can be inspected and validated in scientific machine learning. Kolmogorov-Arnold Networks (KANs) are well suited to this goal because each connection between adjacent units (an "edge") is parametrised by a learnable univariate function that can, in principle, be replaced by a symbolic operator. In practice, however, symbolic extraction is a bottleneck: the standard KAN-to-symbol approach fits operators to each learned edge function in isolation, making the discrete choice sensitive to initialisation and non-convex parameter fitting, and ignoring how local substitutions interact through the full network. We study in-context symbolic regression for operator extraction in KANs, and present two complementary instantiations. Greedy in-context Symbolic Regression (GSR) performs greedy, in-context selection by choosing edge replacements according to end-to-end loss improvement after brief fine-tuning. Gated Matching Pursuit (GMP) amortises this in-context selection by training a differentiable gated operator layer that places an operator library behind sparse gates on each edge; after convergence, gates are discretised (optionally followed by a short in-context greedy refinement pass). We quantify robustness via one-factor-at-a-time (OFAT) hyper-parameter sweeps and assess both predictive error and qualitative consistency of recovered formulas. Across several experiments, greedy in-context symbolic regression achieves up to 99.8% reduction in median OFAT test MSE.

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

Abstract

Symbolic regression aims to replace black-box predictors with concise analytical expressions that can be inspected and validated in scientific machine learning. Kolmogorov-Arnold Networks (KANs) are well suited to this goal because each connection between adjacent units (an "edge") is parametrised by a learnable univariate function that can, in principle, be replaced by a symbolic operator. In practice, however, symbolic extraction is a bottleneck: the standard KAN-to-symbol approach fits operators to each learned edge function in isolation, making the discrete choice sensitive to initialisation and non-convex parameter fitting, and ignoring how local substitutions interact through the full network. We study in-context symbolic regression for operator extraction in KANs, and present two complementary instantiations. Greedy in-context Symbolic Regression (GSR) performs greedy, in-context selection by choosing edge replacements according to end-to-end loss improvement after brief fine-tuning. Gated Matching Pursuit (GMP) amortises this in-context selection by training a differentiable gated operator layer that places an operator library behind sparse gates on each edge; after convergence, gates are discretised (optionally followed by a short in-context greedy refinement pass). We quantify robustness via one-factor-at-a-time (OFAT) hyper-parameter sweeps and assess both predictive error and qualitative consistency of recovered formulas. Across several experiments, greedy in-context symbolic regression achieves up to 99.8% reduction in median OFAT test MSE.
Paper Structure (52 sections, 9 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 52 sections, 9 equations, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: Problem overview: isolated per-edge KAN-to-symbol fitting (AutoSym) is unstable and ignores end-to-end context.
  • Figure 2: Method overview: GSR selects operators by end-to-end loss improvement; GMP amortises in-context selection via sparse operator gates during training, then discretises (optionally refined by a short greedy pass) to reduce candidate-trial cost.
  • Figure 3: OFAT hyper-parameter sensitivity distributions. Violins summarise test MSE across all valid one-factor-at-a-time runs obtained by varying hidden width, $\lambda$, and the number of pruning cycles around the reference configuration; dots denote individual observations. This figure aggregates hyper-parameter perturbations only and does not average over the seed-only repeats from Table \ref{['tab:feynman_seed_sensitivity']}. Red $\times$ markers indicate that a method produced no valid OFAT runs for that dataset and is therefore absent from the violin aggregation. Lower, tighter distributions indicate lower sensitivity and hence greater robustness.