Want to train KANS at scale? Now UKAN!
Alireza Moradzadeh, Srimukh Prasad Veccham, Lukasz Wawrzyniak, Miles Macklin, Saee G. Paliwal
TL;DR
This paper introduces Unbounded Kolmogorov-Arnold Networks (UKANs), which remove the traditional bounded-grid limitation of Kolmogorov-Arnold Networks (KANs) by using a coefficient-generator (CG) MLP to produce B-spline coefficients on an unbounded grid. UKANs couple with MLP-based positional encodings to provide local spline coefficients, enabling function approximation on unbounded domains without data normalization, while a GPU-accelerated warpKAN library speeds up B-spline evaluation and supports large-scale training. Empirical results across regression, classification, approximation, generation, and drug-discovery tasks show that UKANs match or surpass KAN performance, with substantial memory and compute savings (3–30x speedups and up to 1000x memory reductions). The work demonstrates practical scalability for molecular property prediction and other scientific domains, highlighting UKAN as a versatile building block for large-scale, spline-based neural architectures and pointing toward future directions like multi-GPU training and adaptive knot policies.
Abstract
Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful alternative to traditional multilayer perceptrons. However, their reliance on predefined, bounded grids restricts their ability to approximate functions on unbounded domains. To address this, we present Unbounded Kolmogorov-Arnold Networks (UKANs), a method that removes the need for bounded grids in traditional Kolmogorov-Arnold Networks (KANs). The key innovation of this method is a coefficient-generator (CG) model that produces, on the fly, only the B-spline coefficients required locally on an unbounded symmetric grid. UKANs couple multilayer perceptrons with KANs by feeding the positional encoding of grid groups into the CG model, enabling function approximation on unbounded domains without requiring data normalization. To reduce the computational cost of both UKANs and KANs, we introduce a GPU-accelerated library that lowers B-spline evaluation complexity by a factor proportional to the grid size, enabling large-scale learning by leveraging efficient memory management, in line with recent software advances such as FlashAttention and FlashFFTConv. Performance benchmarking confirms the superior memory and computational efficiency of our accelerated KAN (warpKAN), and UKANs, showing a 3-30x speed-up and up to 1000x memory reduction compared to vanilla KANs. Experiments on regression, classification, and generative tasks demonstrate the effectiveness of UKANs to match or surpass KAN accuracy. Finally, we use both accelerated KAN and UKAN in a molecular property prediction task, establishing the feasibility of large-scale end-to-end training with our optimized implementation.
