Softly Symbolifying Kolmogorov-Arnold Networks
James Bagrow, Josh Bongard
TL;DR
This work introduces Softly Symbolified Kolmogorov–Arnold Networks (S2KAN), which infuse symbolic primitives into KAN activations via differentiable gates and an MDL-based objective. By unifying symbolic and dense representations in an end-to-end training loop, S2KAN can discover interpretable forms when possible and gracefully fall back to dense splines when not, achieving competitive accuracy with substantially smaller models. Empirical results across symbolic benchmarks, chaotic dynamical systems, and real-world datasets demonstrate robust performance gains in interpretability and compression, with observed self-sparsification even without strong regularization. The approach offers a principled, controllable tradeoff between predictive power and model parsimony, advancing interpretable neural architectures for scientific machine learning.
Abstract
Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.
