Table of Contents
Fetching ...

Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction

Ange-Clément Akazan, Verlon Roel Mbingui

TL;DR

The paper tackles the challenge of feature selection for high-dimensional tabular data by leveraging Kolmogorov-Arnold Networks (KANs) that express feature transformations with trainable splines. It introduces four feature-importance criteria—KAN-L1, KAN-L2, KAN-SI, and KAN-KO—derived from the spline-parameter blocks and gradient/Majorization principles, and evaluates them against standard baselines on classification and regression benchmarks with leakage-safe cross-validation. Across diverse datasets, KAN-L2, KAN-SI, and KAN-KO often match or exceed traditional selectors, exhibit stable and non-redundant feature subsets, and offer interpretable nonlinear feature relevance, while KAN-L1 can over-prune in noisy or correlated settings. The findings suggest KAN-based feature selection as a robust, interpretable alternative to sparsity- or impurity-based methods, with practical implications for dimensionality reduction in real-world tabular tasks; limitations include slower training relative to linear or tree-based baselines, pointing to future work on accelerated KAN variants and distillation techniques.

Abstract

Feature selection is a key step in many tabular prediction problems, where multiple candidate variables may be redundant, noisy, or weakly informative. We investigate feature selection based on Kolmogorov-Arnold Networks (KANs), which parameterize feature transformations with splines and expose per-feature importance scores in a natural way. From this idea we derive four KAN-based selection criteria (coefficient norms, gradient-based saliency, and knockout scores) and compare them with standard methods such as LASSO, Random Forest feature importance, Mutual Information, and SVM-RFE on a suite of real and synthetic classification and regression datasets. Using average F1 and $R^2$ scores across three feature-retention levels (20%, 40%, 60%), we find that KAN-based selectors are generally competitive with, and sometimes superior to, classical baselines. In classification, KAN criteria often match or exceed existing methods on multi-class tasks by removing redundant features and capturing nonlinear interactions. In regression, KAN-based scores provide robust performance on noisy and heterogeneous datasets, closely tracking strong ensemble predictors; we also observe characteristic failure modes, such as overly aggressive pruning with an $\ell_1$ criterion. Stability and redundancy analyses further show that KAN-based selectors yield reproducible feature subsets across folds while avoiding unnecessary correlation inflation, ensuring reliable and non-redundant variable selection. Overall, our findings demonstrate that KAN-based feature selection provides a powerful and interpretable alternative to traditional methods, capable of uncovering nonlinear and multivariate feature relevance beyond sparsity or impurity-based measures.

Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction

TL;DR

The paper tackles the challenge of feature selection for high-dimensional tabular data by leveraging Kolmogorov-Arnold Networks (KANs) that express feature transformations with trainable splines. It introduces four feature-importance criteria—KAN-L1, KAN-L2, KAN-SI, and KAN-KO—derived from the spline-parameter blocks and gradient/Majorization principles, and evaluates them against standard baselines on classification and regression benchmarks with leakage-safe cross-validation. Across diverse datasets, KAN-L2, KAN-SI, and KAN-KO often match or exceed traditional selectors, exhibit stable and non-redundant feature subsets, and offer interpretable nonlinear feature relevance, while KAN-L1 can over-prune in noisy or correlated settings. The findings suggest KAN-based feature selection as a robust, interpretable alternative to sparsity- or impurity-based methods, with practical implications for dimensionality reduction in real-world tabular tasks; limitations include slower training relative to linear or tree-based baselines, pointing to future work on accelerated KAN variants and distillation techniques.

Abstract

Feature selection is a key step in many tabular prediction problems, where multiple candidate variables may be redundant, noisy, or weakly informative. We investigate feature selection based on Kolmogorov-Arnold Networks (KANs), which parameterize feature transformations with splines and expose per-feature importance scores in a natural way. From this idea we derive four KAN-based selection criteria (coefficient norms, gradient-based saliency, and knockout scores) and compare them with standard methods such as LASSO, Random Forest feature importance, Mutual Information, and SVM-RFE on a suite of real and synthetic classification and regression datasets. Using average F1 and scores across three feature-retention levels (20%, 40%, 60%), we find that KAN-based selectors are generally competitive with, and sometimes superior to, classical baselines. In classification, KAN criteria often match or exceed existing methods on multi-class tasks by removing redundant features and capturing nonlinear interactions. In regression, KAN-based scores provide robust performance on noisy and heterogeneous datasets, closely tracking strong ensemble predictors; we also observe characteristic failure modes, such as overly aggressive pruning with an criterion. Stability and redundancy analyses further show that KAN-based selectors yield reproducible feature subsets across folds while avoiding unnecessary correlation inflation, ensuring reliable and non-redundant variable selection. Overall, our findings demonstrate that KAN-based feature selection provides a powerful and interpretable alternative to traditional methods, capable of uncovering nonlinear and multivariate feature relevance beyond sparsity or impurity-based measures.

Paper Structure

This paper contains 41 sections, 45 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Average $F1$ Score ( averaged across 20/40/60% retained features) per Selectors for Each Classifiers
  • Figure 2: Relative Average $R^2$ Score (averaged across 20/40/60% retained features) per Selectors for Each Regressors
  • Figure 3: Breast Cancer dataset interpretability visualizations using mean concave points.
  • Figure 4: Breast Cancer dataset interpretability visualizations using KAN-L2 feature choice.
  • Figure 5: Breast Cancer dataset interpretability visualizations. (a) KAN first-layer responses for the top three features. (b) Logit sensitivity curve with respect to worst concave points.
  • ...and 3 more figures