Table of Contents
Fetching ...

KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example

Johannes Erdmann, Florian Mausolf, Jan Lukas Späh

TL;DR

The paper investigates Kolmogorov-Arnold Networks (KANs) as interpretable alternatives to multilayer perceptrons (MLPs) for binary event classification in high-energy physics (HEP). Using a ttH vs tH classification in the H→γγ channel at 14 TeV, with 22 input features and careful preprocessing, they compare KANs of varying depths and widths to MLPs. The one-layer KAN learns activations resembling univariate log-likelihood ratios, while deeper KANs develop more complex representations; the best KAN performance is comparable to the best MLP (AUC ≈ 0.908), though very small KANs do not outperform small MLPs, and larger KANs do not outperform their MLP counterparts in parameter efficiency. Overall, KANs can match MLP performance for this task while offering interpretability advantages for small configurations, suggesting further exploration of interpretability techniques and potential extensions to regression tasks in HEP.

Abstract

Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. Consistent with expectations, we find that the learned activation functions of a one-layer KAN resemble the univariate log-likelihood ratios of the respective input features. In deeper KANs, the activations in the first layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data, a pattern commonly observed in other deep-learning architectures. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.

KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example

TL;DR

The paper investigates Kolmogorov-Arnold Networks (KANs) as interpretable alternatives to multilayer perceptrons (MLPs) for binary event classification in high-energy physics (HEP). Using a ttH vs tH classification in the H→γγ channel at 14 TeV, with 22 input features and careful preprocessing, they compare KANs of varying depths and widths to MLPs. The one-layer KAN learns activations resembling univariate log-likelihood ratios, while deeper KANs develop more complex representations; the best KAN performance is comparable to the best MLP (AUC ≈ 0.908), though very small KANs do not outperform small MLPs, and larger KANs do not outperform their MLP counterparts in parameter efficiency. Overall, KANs can match MLP performance for this task while offering interpretability advantages for small configurations, suggesting further exploration of interpretability techniques and potential extensions to regression tasks in HEP.

Abstract

Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. Consistent with expectations, we find that the learned activation functions of a one-layer KAN resemble the univariate log-likelihood ratios of the respective input features. In deeper KANs, the activations in the first layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data, a pattern commonly observed in other deep-learning architectures. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.
Paper Structure (5 sections, 5 equations, 8 figures)

This paper contains 5 sections, 5 equations, 8 figures.

Figures (8)

  • Figure 1: Distributions of ten example features used for the classification. For distributions with overflow, the overflow is included in the last bin.
  • Figure 2: Matrix of the Pearson correlation coefficients of all 22 input features. The upper triangle refers to the $tH$ dataset and the lower triangle refers to the $t\bar{t}H$ dataset. Off-diagonal coefficients with absolute values of at least $10\,\%$ are shown as numbers on the plot.
  • Figure 3: Evolution of the loss (upper row) and the accuracy (lower row) of three KAN models of depth one, two and four, respectively. Due to the early-stopping approach, the epoch from which the model parameters are used appears 25 epochs before the end of the optimization. Instabilities occur in training epochs where the spline domains of multi-layer KANs are adapted.
  • Figure 4: Output distributions on the test dataset for the two classes for three KANs with structures 22--1, 22--45--1 and 22--10--5--2--1, respectively.
  • Figure 5: Graphical representation of the trained KAN with a single layer (KAN 22--1). The red curves represent the learned activation functions, while the blue curve shows the sigmoid function used to normalize the network output. The $L_1$-norm of each spline is given, which also defines the grayscale of each edge.
  • ...and 3 more figures