Table of Contents
Fetching ...

Softly Symbolifying Kolmogorov-Arnold Networks

James Bagrow, Josh Bongard

TL;DR

This work introduces Softly Symbolified Kolmogorov–Arnold Networks (S2KAN), which infuse symbolic primitives into KAN activations via differentiable gates and an MDL-based objective. By unifying symbolic and dense representations in an end-to-end training loop, S2KAN can discover interpretable forms when possible and gracefully fall back to dense splines when not, achieving competitive accuracy with substantially smaller models. Empirical results across symbolic benchmarks, chaotic dynamical systems, and real-world datasets demonstrate robust performance gains in interpretability and compression, with observed self-sparsification even without strong regularization. The approach offers a principled, controllable tradeoff between predictive power and model parsimony, advancing interpretable neural architectures for scientific machine learning.

Abstract

Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.

Softly Symbolifying Kolmogorov-Arnold Networks

TL;DR

This work introduces Softly Symbolified Kolmogorov–Arnold Networks (S2KAN), which infuse symbolic primitives into KAN activations via differentiable gates and an MDL-based objective. By unifying symbolic and dense representations in an end-to-end training loop, S2KAN can discover interpretable forms when possible and gracefully fall back to dense splines when not, achieving competitive accuracy with substantially smaller models. Empirical results across symbolic benchmarks, chaotic dynamical systems, and real-world datasets demonstrate robust performance gains in interpretability and compression, with observed self-sparsification even without strong regularization. The approach offers a principled, controllable tradeoff between predictive power and model parsimony, advancing interpretable neural architectures for scientific machine learning.

Abstract

Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.

Paper Structure

This paper contains 16 sections, 17 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Learning $y = \mathrm{sinc}(x) = \sin(x)/x$ with a multiplicative Kolmogorov--Arnold network (KAN). The standard KAN lacks symbolic fidelity, exhibiting a numerically accurate but otherwise pathological decomposition, whereas S2KAN perfectly captures the underlying sinc function.
  • Figure 2: Dynamical system modeling for the Ikeda map. (a) $\beta=0$, 20k epochs. (b) $\beta=0.1$, 4k epochs. The regularized S2KAN accurately forecasts the dynamics from the initial condition as long as the baseline (shaded region).
  • Figure 3: Dynamical system modeling for the ecosystem. (a) $\beta=0$, 15k epochs. (b) $\beta=0.1$, 10k epochs. The unregularized S2KAN captures the multi-step dynamics well.
  • Figure 4: Learning dynamics for S2KAN on the superconductor dataset, comparing shallow $[5,5,1]$ (top) and deep $[5,5,5,1]$ (bottom) architectures with $\beta=0.5$. Gray shading indicates the warmup period ($\beta=0$).
  • Figure 5: Self-sparsification of the Ikeda map (a) and ecosystem networks (b). Horizontal dashed lines indicate the number of active terms in the baseline KAN. Training MSEs are presented for reference.