A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

Guy Bar-Shalom; Ami Tavory; Itay Evron; Maya Bechler-Speicher; Ido Guy; Haggai Maron

A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

Guy Bar-Shalom, Ami Tavory, Itay Evron, Maya Bechler-Speicher, Ido Guy, Haggai Maron

TL;DR

This work introduces a symmetry-aware framework for learning in the weight-space of Kolmogorov–Arnold Networks (KANs) by constructing a KAN-graph and a graph neural network WS-KAN that processes it. By proving that KANs share permutation symmetries with traditional networks and showing WS-KAN can simulate a KAN's forward pass, the authors provide both theoretical and empirical justification for using graph-based WS models on KANs. The approach yields strong performance across INR classification, accuracy prediction, and pruning mask prediction, significantly outperforming structure-agnostic baselines and offering favorable generalization properties. The release of a model zoo and code promotes reproducibility and encourages further exploration of symmetry-aware weight-space learning for this novel network class.

Abstract

Weight-space models learn directly from the parameters of neural networks, enabling tasks such as predicting their accuracy on new datasets. Naive methods -- like applying MLPs to flattened parameters -- perform poorly, making the design of better weight-space architectures a central challenge. While prior work leveraged permutation symmetries in standard networks to guide such designs, no analogous analysis or tailored architecture yet exists for Kolmogorov-Arnold Networks (KANs). In this work, we show that KANs share the same permutation symmetries as MLPs, and propose the KAN-graph, a graph representation of their computation. Building on this, we develop WS-KAN, the first weight-space architecture that learns on KANs, which naturally accounts for their symmetry. We analyze WS-KAN's expressive power, showing it can replicate an input KAN's forward pass - a standard approach for assessing expressiveness in weight-space architectures. We construct a comprehensive ``zoo'' of trained KANs spanning diverse tasks, which we use as benchmarks to empirically evaluate WS-KAN. Across all tasks, WS-KAN consistently outperforms structure-agnostic baselines, often by a substantial margin. Our code is available at https://github.com/BarSGuy/KAN-Graph-Metanetwork.

A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

TL;DR

Abstract

Paper Structure (42 sections, 6 theorems, 41 equations, 17 figures, 11 tables)

This paper contains 42 sections, 6 theorems, 41 equations, 17 figures, 11 tables.

Introduction
Background and related work
Learning on KAN parameter spaces
Overview.
Permutation symmetries in KANs
KAN-graph
Learning on the KAN-graph
Expressive power of WS-KAN
Experiments
INR classification
Out-of-distribution generalization to wider KANs
Accuracy prediction
Pruning mask prediction
Conclusions
Appendix Roadmap
...and 27 more sections

Key Result

Proposition 3.0

Let $\theta = (\bm{\phi}^L, \ldots, \bm{\phi}^{1})$ denote the collection of parametric one-dimensional functions composing an $L$-layer KAN. Consider the group, $G \coloneqq S_{d_1} \times S_{d_2} \times \cdots \times S_{d_{L-1}}$, the direct product of symmetric groups corresponding to the interme Then, $f_\theta(\bm{x}) = f_{\theta'}(\bm{x})$ for all $\bm{x}$.

Figures (17)

Figure 1: Constructing the KAN-graph for a given Kolmogorov-Arnold Network (KAN).
Figure 2: Hidden neuron permutation symmetries in KANs.
Figure 3: PE.
Figure 4: Downstream pruning performance across methods over KANs trained on MNIST. We report: (i) Test accuracy: the downstream accuracy of pruned networks, averaged over non-overlapping bins of 20%, to highlight the relative effectiveness of pruning strategies under varying noise levels -- \ref{['fig:prune_acc']}; (ii) Kept weights: the percentage of weights retained after pruning, averaged over the same bins as in (i) -- \ref{['fig:prune_weight_kept']}; and (iii) Pruning time$(\downarrow)$: the computational cost comparison (log scale) in seconds, between WS-KAN and Oracle prune -- \ref{['fig:prun_time']}, low is better.
Figure 5: Example of a KAN-based INR applied to the synthetic 2D sine wave dataset. The left panel shows the ground truth, and the right panel shows the reconstructed result.
...and 12 more figures

Theorems & Definitions (10)

Proposition 3.0: KAN symmetries
Lemma 4.0: MLP as an approximation of the univariate functions composing the KAN
Proposition 4.0: can simulate the forward pass of KANs
proof : Proof idea
Lemma E.0: MLP as an approximation of the univariate functions composing the KAN
proof
Proposition E.0: can simulate the forward pass of KANs
proof
Proposition E.0: KAN symmetries
proof

A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

TL;DR

Abstract

A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (10)