KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example
Johannes Erdmann, Florian Mausolf, Jan Lukas Späh
TL;DR
The paper investigates Kolmogorov-Arnold Networks (KANs) as interpretable alternatives to multilayer perceptrons (MLPs) for binary event classification in high-energy physics (HEP). Using a ttH vs tH classification in the H→γγ channel at 14 TeV, with 22 input features and careful preprocessing, they compare KANs of varying depths and widths to MLPs. The one-layer KAN learns activations resembling univariate log-likelihood ratios, while deeper KANs develop more complex representations; the best KAN performance is comparable to the best MLP (AUC ≈ 0.908), though very small KANs do not outperform small MLPs, and larger KANs do not outperform their MLP counterparts in parameter efficiency. Overall, KANs can match MLP performance for this task while offering interpretability advantages for small configurations, suggesting further exploration of interpretability techniques and potential extensions to regression tasks in HEP.
Abstract
Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. Consistent with expectations, we find that the learned activation functions of a one-layer KAN resemble the univariate log-likelihood ratios of the respective input features. In deeper KANs, the activations in the first layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data, a pattern commonly observed in other deep-learning architectures. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.
