Catastrophic Forgetting in Kolmogorov-Arnold Networks
Mohammad Marufur Rahman, Guanchu Wang, Kaixiong Zhou, Minghan Chen, Fan Yang
TL;DR
The paper investigates catastrophic forgetting in Kolmogorov-Arnold Networks (KANs) and presents a theoretical framework that connects forgetting to activation support overlap and intrinsic data dimension. It validates the theory with synthetic and image-classification experiments and introduces KAN-LoRA, a KAN-based adapter for continual knowledge editing in language models, demonstrating improved retention over MLP-based adapters in many settings. Key findings show that KANs exhibit strong retention in low-dimensional, structured tasks but are vulnerable in high-dimensional domains such as vision and language, with forgetting scaling with overlap and exponential with intrinsic dimension. The work provides practical guidance for designing continual learning systems and offers a pathway to memory-aware model editing using KAN-based adapters.
Abstract
Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have been proposed for Multi-Layer Perceptrons (MLPs), recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting by leveraging localized spline-based activations. However, the practical behavior of KANs under continual learning remains unclear, and their limitations are not well understood. To address this, we present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension. We validate these analyses through systematic experiments on synthetic and vision tasks, measuring forgetting dynamics under varying model configurations and data complexity. Further, we introduce KAN-LoRA, a novel adapter design for parameter-efficient continual fine-tuning of language models, and evaluate its effectiveness in knowledge editing tasks. Our findings reveal that while KANs exhibit promising retention in low-dimensional algorithmic settings, they remain vulnerable to forgetting in high-dimensional domains such as image classification and language modeling. These results advance the understanding of KANs' strengths and limitations, offering practical insights for continual learning system design.
