Table of Contents
Fetching ...

Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks

Yihang Gao, Michael K. Ng, Vincent Y. F. Tan

TL;DR

The paper introduces LoTRA, a Tucker-decomposition–based low tensor-rank adaptation method for transferring and fine-tuning Kolmogorov--Arnold Networks (KANs). By updating only a small core tensor and a set of factor matrices per layer, LoTRA achieves substantial parameter reduction while preserving or approaching the performance of full fine-tuning, with an adaptive learning-rate strategy proven to be essential. Theoretical results establish expressiveness bounds and efficient-training conditions, guiding practical LR choices. Empirically, LoTRA enables efficient PDE solving across parameter variations and, via Slim KANs, demonstrates parameter-efficient function representation and image classification capabilities, showing strong potential for scalable physics-informed learning.

Abstract

Kolmogorov--Arnold networks (KANs) have demonstrated their potential as an alternative to multi-layer perceptions (MLPs) in various domains, especially for science-related tasks. However, transfer learning of KANs remains a relatively unexplored area. In this paper, inspired by Tucker decomposition of tensors and evidence on the low tensor-rank structure in KAN parameter updates, we develop low tensor-rank adaptation (LoTRA) for fine-tuning KANs. We study the expressiveness of LoTRA based on Tucker decomposition approximations. Furthermore, we provide a theoretical analysis to select the learning rates for each LoTRA component to enable efficient training. Our analysis also shows that using identical learning rates across all components leads to inefficient training, highlighting the need for an adaptive learning rate strategy. Beyond theoretical insights, we explore the application of LoTRA for efficiently solving various partial differential equations (PDEs) by fine-tuning KANs. Additionally, we propose Slim KANs that incorporate the inherent low-tensor-rank properties of KAN parameter tensors to reduce model size while maintaining superior performance. Experimental results validate the efficacy of the proposed learning rate selection strategy and demonstrate the effectiveness of LoTRA for transfer learning of KANs in solving PDEs. Further evaluations on Slim KANs for function representation and image classification tasks highlight the expressiveness of LoTRA and the potential for parameter reduction through low tensor-rank decomposition.

Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks

TL;DR

The paper introduces LoTRA, a Tucker-decomposition–based low tensor-rank adaptation method for transferring and fine-tuning Kolmogorov--Arnold Networks (KANs). By updating only a small core tensor and a set of factor matrices per layer, LoTRA achieves substantial parameter reduction while preserving or approaching the performance of full fine-tuning, with an adaptive learning-rate strategy proven to be essential. Theoretical results establish expressiveness bounds and efficient-training conditions, guiding practical LR choices. Empirically, LoTRA enables efficient PDE solving across parameter variations and, via Slim KANs, demonstrates parameter-efficient function representation and image classification capabilities, showing strong potential for scalable physics-informed learning.

Abstract

Kolmogorov--Arnold networks (KANs) have demonstrated their potential as an alternative to multi-layer perceptions (MLPs) in various domains, especially for science-related tasks. However, transfer learning of KANs remains a relatively unexplored area. In this paper, inspired by Tucker decomposition of tensors and evidence on the low tensor-rank structure in KAN parameter updates, we develop low tensor-rank adaptation (LoTRA) for fine-tuning KANs. We study the expressiveness of LoTRA based on Tucker decomposition approximations. Furthermore, we provide a theoretical analysis to select the learning rates for each LoTRA component to enable efficient training. Our analysis also shows that using identical learning rates across all components leads to inefficient training, highlighting the need for an adaptive learning rate strategy. Beyond theoretical insights, we explore the application of LoTRA for efficiently solving various partial differential equations (PDEs) by fine-tuning KANs. Additionally, we propose Slim KANs that incorporate the inherent low-tensor-rank properties of KAN parameter tensors to reduce model size while maintaining superior performance. Experimental results validate the efficacy of the proposed learning rate selection strategy and demonstrate the effectiveness of LoTRA for transfer learning of KANs in solving PDEs. Further evaluations on Slim KANs for function representation and image classification tasks highlight the expressiveness of LoTRA and the potential for parameter reduction through low tensor-rank decomposition.

Paper Structure

This paper contains 28 sections, 4 theorems, 37 equations, 9 figures, 4 tables.

Key Result

Lemma 1

For each $\ell \in [L]$, there exist $\left(\mathcal{G}_{\ell}, \bm{U}_{\ell}^{(1)}, \bm{U}_{\ell}^{(2)}, \bm{U}_{\ell}^{(3)}\right)$, such that where $\sigma_{r}(\cdot)$ denotes the $r$-th largest singular value of the given matrix, and $\mathcal{A}_{\ell,\text{ft}}$ follows eq_update_finetune.

Figures (9)

  • Figure 1: Singular values for the pre-trained model parameters and fine-tuned updates, demonstrating the low Tucker rank structure.
  • Figure 2: Fine-tuning trajectories of Chebyshev KANs using LoTRA under four strategies of learning rate selection (denoted as "LR-1" to "LR-4") for solving elliptic equations.
  • Figure 3: Fine-tuning trajectories of Chebyshev KANs using LoTRA, compared to fully updated KANs and MLPs, for solving elliptic equations.
  • Figure 4: Fine-tuning trajectories of Chebyshev KANs using LoTRA under four strategies of learning rate selection (denoted as "LR-1" to "LR-4") for solving Allen-Cahn equations.
  • Figure 5: Fine-tuning trajectories of Chebyshev KANs using LoTRA, compared to fully updated KANs and MLPs, for solving Allen-Cahn equations.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Lemma 1: Tucker Approximation de2000multilinear
  • Theorem 1
  • Definition 1
  • Theorem 2
  • Corollary 1