Table of Contents
Fetching ...

DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks

Shrenik Zinage, Sudeepta Mondal, Soumalya Sarkar

TL;DR

DKL aims to integrate deep representations with Gaussian process regression to achieve scalable, uncertainty-aware predictions. The paper introduces DKL-KAN, a scalable deep kernel using Kolmogorov-Arnold Networks, and compares against DKL-MLP using SKI-KISS-GP backends for low and high dimensional inputs. Contributions include two KAN configurations (DKL-KAN1 with the same neuron count as the MLP, and DKL-KAN2 with a similar parameter budget), extensive UCI regression experiments, and analysis of discontinuities and uncertainty; results show DKL-KAN excels on small datasets but lags on very large datasets, highlighting scalability trade-offs. The work provides an alternative to MLP-based deep kernels with improved uncertainty handling and guides future design of scalable deep kernel methods.

Abstract

The need for scalable and expressive models in machine learning is paramount, particularly in applications requiring both structural depth and flexibility. Traditional deep learning methods, such as multilayer perceptrons (MLP), offer depth but lack ability to integrate structural characteristics of deep learning architectures with non-parametric flexibility of kernel methods. To address this, deep kernel learning (DKL) was introduced, where inputs to a base kernel are transformed using a deep learning architecture. These kernels can replace standard kernels, allowing both expressive power and scalability. The advent of Kolmogorov-Arnold Networks (KAN) has generated considerable attention and discussion among researchers in scientific domain. In this paper, we introduce a scalable deep kernel using KAN (DKL-KAN) as an effective alternative to DKL using MLP (DKL-MLP). Our approach involves simultaneously optimizing these kernel attributes using marginal likelihood within a Gaussian process framework. We analyze two variants of DKL-KAN for a fair comparison with DKL-MLP: one with same number of neurons and layers as DKL-MLP, and another with approximately same number of trainable parameters. To handle large datasets, we use kernel interpolation for scalable structured Gaussian processes (KISS-GP) for low-dimensional inputs and KISS-GP with product kernels for high-dimensional inputs. The efficacy of DKL-KAN is evaluated in terms of computational training time and test prediction accuracy across a wide range of applications. Additionally, the effectiveness of DKL-KAN is also examined in modeling discontinuities and accurately estimating prediction uncertainty. The results indicate that DKL-KAN outperforms DKL-MLP on datasets with a low number of observations. Conversely, DKL-MLP exhibits better scalability and higher test prediction accuracy on datasets with large number of observations.

DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks

TL;DR

DKL aims to integrate deep representations with Gaussian process regression to achieve scalable, uncertainty-aware predictions. The paper introduces DKL-KAN, a scalable deep kernel using Kolmogorov-Arnold Networks, and compares against DKL-MLP using SKI-KISS-GP backends for low and high dimensional inputs. Contributions include two KAN configurations (DKL-KAN1 with the same neuron count as the MLP, and DKL-KAN2 with a similar parameter budget), extensive UCI regression experiments, and analysis of discontinuities and uncertainty; results show DKL-KAN excels on small datasets but lags on very large datasets, highlighting scalability trade-offs. The work provides an alternative to MLP-based deep kernels with improved uncertainty handling and guides future design of scalable deep kernel methods.

Abstract

The need for scalable and expressive models in machine learning is paramount, particularly in applications requiring both structural depth and flexibility. Traditional deep learning methods, such as multilayer perceptrons (MLP), offer depth but lack ability to integrate structural characteristics of deep learning architectures with non-parametric flexibility of kernel methods. To address this, deep kernel learning (DKL) was introduced, where inputs to a base kernel are transformed using a deep learning architecture. These kernels can replace standard kernels, allowing both expressive power and scalability. The advent of Kolmogorov-Arnold Networks (KAN) has generated considerable attention and discussion among researchers in scientific domain. In this paper, we introduce a scalable deep kernel using KAN (DKL-KAN) as an effective alternative to DKL using MLP (DKL-MLP). Our approach involves simultaneously optimizing these kernel attributes using marginal likelihood within a Gaussian process framework. We analyze two variants of DKL-KAN for a fair comparison with DKL-MLP: one with same number of neurons and layers as DKL-MLP, and another with approximately same number of trainable parameters. To handle large datasets, we use kernel interpolation for scalable structured Gaussian processes (KISS-GP) for low-dimensional inputs and KISS-GP with product kernels for high-dimensional inputs. The efficacy of DKL-KAN is evaluated in terms of computational training time and test prediction accuracy across a wide range of applications. Additionally, the effectiveness of DKL-KAN is also examined in modeling discontinuities and accurately estimating prediction uncertainty. The results indicate that DKL-KAN outperforms DKL-MLP on datasets with a low number of observations. Conversely, DKL-MLP exhibits better scalability and higher test prediction accuracy on datasets with large number of observations.
Paper Structure (15 sections, 18 equations, 4 figures, 4 tables)

This paper contains 15 sections, 18 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Schematic of DKL-KAN with 3 inputs and 1 output
  • Figure 2: Average training time vs datasets (Compute: NVIDIA A100-SXM4-40GB chip (1 GPU with 512 GB RAM))
  • Figure 3: GP prediction
  • Figure 4: Learned mapping