Constructing sensible baselines for Integrated Gradients

Jai Bardhan; Cyrin Neeraj; Mihir Rawat; Subhadip Mitra

Constructing sensible baselines for Integrated Gradients

Jai Bardhan, Cyrin Neeraj, Mihir Rawat, Subhadip Mitra

TL;DR

The paper addresses interpretability of deep neural networks in high-energy physics by applying Integrated Gradients (IGs) with Averaged Baselines to distinguish signal from background events. IG attributions are defined as $\phi_i^{IG}(f,\mathbf{x},\mathbf{x'}) = (x_i - x_i') \int_{0}^{1} \frac{\partial f(\mathbf{x'} + \alpha(\mathbf{x}-\mathbf{x'}))}{\partial x_i} \, d\alpha$, and the averaged attribution is $\phi_i(f,\mathbf{x}) = \int \phi_i^{IG}(f,\mathbf{x},\mathbf{x'}) \, p_D(\mathbf{x'}) \, d\mathbf{x'}$. The baselines are drawn from the background distribution, with uniform or cross-section–weighted weighting, and the approach is tested on a collider event-classification task involving a vectorlike quark signal. The results show that averaged baselines provide more reasonable feature attributions than a blank baseline and highlight physically meaningful features such as $H_T$, $p_{T\ell}$, and $\slashed{E}_T$, suggesting broader applicability for model explanation, feature selection, and physics insight.

Abstract

Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.

Constructing sensible baselines for Integrated Gradients

TL;DR

Abstract

Constructing sensible baselines for Integrated Gradients

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)