Table of Contents
Fetching ...

Constructing sensible baselines for Integrated Gradients

Jai Bardhan, Cyrin Neeraj, Mihir Rawat, Subhadip Mitra

TL;DR

The paper addresses interpretability of deep neural networks in high-energy physics by applying Integrated Gradients (IGs) with Averaged Baselines to distinguish signal from background events. IG attributions are defined as $\phi_i^{IG}(f,\mathbf{x},\mathbf{x'}) = (x_i - x_i') \int_{0}^{1} \frac{\partial f(\mathbf{x'} + \alpha(\mathbf{x}-\mathbf{x'}))}{\partial x_i} \, d\alpha$, and the averaged attribution is $\phi_i(f,\mathbf{x}) = \int \phi_i^{IG}(f,\mathbf{x},\mathbf{x'}) \, p_D(\mathbf{x'}) \, d\mathbf{x'}$. The baselines are drawn from the background distribution, with uniform or cross-section–weighted weighting, and the approach is tested on a collider event-classification task involving a vectorlike quark signal. The results show that averaged baselines provide more reasonable feature attributions than a blank baseline and highlight physically meaningful features such as $H_T$, $p_{T\ell}$, and $\slashed{E}_T$, suggesting broader applicability for model explanation, feature selection, and physics insight.

Abstract

Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.

Constructing sensible baselines for Integrated Gradients

TL;DR

The paper addresses interpretability of deep neural networks in high-energy physics by applying Integrated Gradients (IGs) with Averaged Baselines to distinguish signal from background events. IG attributions are defined as , and the averaged attribution is . The baselines are drawn from the background distribution, with uniform or cross-section–weighted weighting, and the approach is tested on a collider event-classification task involving a vectorlike quark signal. The results show that averaged baselines provide more reasonable feature attributions than a blank baseline and highlight physically meaningful features such as , , and , suggesting broader applicability for model explanation, feature selection, and physics insight.

Abstract

Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.

Paper Structure

This paper contains 9 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Top-$5$ features ranked by their attribution scores for each of the baselines.
  • Figure 2: Fig (a) show an increase in accuracy as we include more of the top-$k$ features for each of the baseline. Fig (b) shows an increase in the signal sensitivity ($Z$) as we include more of the top-$k$ feature for each of the baselines. We see that the averaged baselines consistently outperform the blank baseline for all values until $k\sim15$.
  • Figure 3: Top-$20$ features ranked by attribution for baseline B$_0$
  • Figure 4: Top-$20$ features ranked by attribution for baseline B$_{\text{bg}}$
  • Figure 5: Top-$20$ features ranked by attribution for baseline B$_{\text{bgw}}$