Merging of Kolmogorov-Arnold networks trained on disjoint datasets
Andrew Polar, Michael Poluektov
TL;DR
The paper demonstrates that Kolmogorov-Arnold networks trained with piecewise-linear basis functions can be accelerated by training on disjoint data batches and merging models via parameter averaging, a federated-learning-inspired approach. Using Newton-Kaczmarz optimization, the authors show near-linear strong scaling on multicore hardware and good weak scaling on a HPC cluster, with substantial speedups over traditional neural networks. They also introduce pretraining and concurrent-serial alternations to further boost performance, achieving multi-minute reductions in training time on large-scale tasks while maintaining competitive accuracy. The work highlights practical pathways to deploy faster KAN training, including potential FPGA/ASIC implementations, and provides publicly available code.
Abstract
Training on disjoint datasets can serve two primary goals: accelerating data processing and enabling federated learning. It has already been established that Kolmogorov-Arnold networks (KANs) are particularly well suited for federated learning and can be merged through simple parameter averaging. While the federated learning literature has mostly focused on achieving training convergence across distributed nodes, the present paper specifically targets acceleration of the training, which depends critically on the choice of an optimisation method and the type of the basis functions. To the best knowledge of the authors, the fastest currently-available combination is the Newton-Kaczmarz method and the piecewise-linear basis functions. Here, it is shown that training on disjoint datasets (or disjoint subsets of the training dataset) can further improve the performance. Experimental comparisons are provided, and all corresponding codes are publicly available.
