Table of Contents
Fetching ...

Interpretable Clinical Classification with Kolgomorov-Arnold Networks

Alejandro Almodóvar, Patricia A. Apellániz, Alba Garrido, Fernando Fernández-Salvador, Santiago Zazo, Juan Parras

TL;DR

This work tackles the challenge of trustworthy AI in clinical practice by introducing Kolmogorov-Arnold Networks (KANs) for tabular health data, with two interpretable variants: Logistic-KAN, a flexible generalization of logistic regression, and KAAM, an additively separable model that yields symbolic, inspectable formulas. Across six public clinical datasets, the proposed models achieve competitive predictive performance while delivering built-in interpretability through tools such as partial dependence plots, feature importance in the logit space, probability radar plots, and nearest-patient retrieval, avoiding post-hoc explanations. Extensive experiments show Logistic-KAN often attaining the highest mean reciprocal rank and KAAM delivering strong ROC-AUC and precision, with statistical analyses confirming robustness. The work also demonstrates practical interpretability via symbolic logit expressions and interactive interfaces, supporting clinician trust and auditability, and provides open-source code to facilitate adoption and reproducibility.

Abstract

Why should a clinician trust an Artificial Intelligence (AI) prediction? Despite the increasing accuracy of machine learning methods in medicine, the lack of transparency continues to hinder their adoption in clinical practice. In this work, we explore Kolmogorov-Arnold Networks (KANs) for clinical classification tasks on tabular data. In contrast to traditional neural networks, KANs are function-based architectures that offer intrinsic interpretability through transparent, symbolic representations. We introduce \emph{Logistic-KAN}, a flexible generalization of logistic regression, and \emph{Kolmogorov-Arnold Additive Model (KAAM)}, a simplified additive variant that delivers transparent, symbolic formulas. Unlike ``black-box'' models that require post-hoc explainability tools, our models support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval. Across multiple health datasets, our models match or outperform standard baselines, while remaining fully interpretable. These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon. We release the code for reproducibility in \codeurl.

Interpretable Clinical Classification with Kolgomorov-Arnold Networks

TL;DR

This work tackles the challenge of trustworthy AI in clinical practice by introducing Kolmogorov-Arnold Networks (KANs) for tabular health data, with two interpretable variants: Logistic-KAN, a flexible generalization of logistic regression, and KAAM, an additively separable model that yields symbolic, inspectable formulas. Across six public clinical datasets, the proposed models achieve competitive predictive performance while delivering built-in interpretability through tools such as partial dependence plots, feature importance in the logit space, probability radar plots, and nearest-patient retrieval, avoiding post-hoc explanations. Extensive experiments show Logistic-KAN often attaining the highest mean reciprocal rank and KAAM delivering strong ROC-AUC and precision, with statistical analyses confirming robustness. The work also demonstrates practical interpretability via symbolic logit expressions and interactive interfaces, supporting clinician trust and auditability, and provides open-source code to facilitate adoption and reproducibility.

Abstract

Why should a clinician trust an Artificial Intelligence (AI) prediction? Despite the increasing accuracy of machine learning methods in medicine, the lack of transparency continues to hinder their adoption in clinical practice. In this work, we explore Kolmogorov-Arnold Networks (KANs) for clinical classification tasks on tabular data. In contrast to traditional neural networks, KANs are function-based architectures that offer intrinsic interpretability through transparent, symbolic representations. We introduce \emph{Logistic-KAN}, a flexible generalization of logistic regression, and \emph{Kolmogorov-Arnold Additive Model (KAAM)}, a simplified additive variant that delivers transparent, symbolic formulas. Unlike ``black-box'' models that require post-hoc explainability tools, our models support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval. Across multiple health datasets, our models match or outperform standard baselines, while remaining fully interpretable. These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon. We release the code for reproducibility in \codeurl.

Paper Structure

This paper contains 29 sections, 19 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: PDPs for a test patient in the Diabetes-130 dataset using the KAAM model. Each subplot illustrates how individual covariates affect the predicted probability for a specific class. Blue lines represent the KAAM-learned covariate contribution, orange dots mark the current patient's values, light blue dots correspond to the training cohort, and green dots show the nearest neighbors.
  • Figure 2: Importance plot for the patients in the Diabetes-130 dataset using the KAAM model. Not all covariates are equally important for predicting each class, although some covariates (Age or Insulin) are important across classes. Deviations between train and test patients are minimal.
  • Figure 3: PRPs for a representative test patient from the Diabetes-130 dataset, showing the individual contribution of each covariate to the predicted class probabilities. Each axis corresponds to a covariate. The orange polygon represents the class probability when the covariate is fixed to the patient's actual value and all other features are set to their population averages. The blue polygon represents the average patient (mean value for all features), and the green polygon corresponds to the closest training sample. In Class 2, for example, the patient’s num_procedures significantly increases the risk, while num_medications decreases it. The similarity in polygon shapes between the orange and green curves provides an intuitive indication of how familiar or atypical a patient is compared to historical data.
  • Figure 4: Predicted probabilities for test patients in the Heart dataset using KAAM. Each bar represents one patient, colored by true class (red: negative, green: positive). Predictions are shown using the original B-spline model (left) and its symbolic approximation (right). The dashed line denotes the classification threshold (0.5).
  • Figure 5: Importance of the covariates in the Heart dataset predicted using the KAAM variance and SHAP. Both methods yield very similar results, emphasizing the same features, which confirms the correctness of the proposed method and aligns with clinical intuition (i.e., Age is a good predictor of heart disease).
  • ...and 3 more figures