Table of Contents
Fetching ...

A Step Toward Interpretability: Smearing the Likelihood

Andrew J. Larkoski

TL;DR

The paper defines interpretability in particle-physics ML as isolating the physical energy scales that a model exploits, implemented by smearing data with a metric distance $d(\cdot,\cdot)$ of energy units and analyzing the smeared likelihood $\mathcal{L}(\vec{x}|\epsilon)$. It adopts the $p=2$ Spectral Energy Mover's Distance as the IRC-safe metric, constructing smeared distributions and studying discrimination power for quark–gluon jets as a function of the resolution $\epsilon$. Through empirical observations and extreme value theory, it shows that the minimal resolvable scale shrinks with dataset size following a power-law with a slowly varying exponent, implying that practical models must extrapolate to perturbative (and potentially nonperturbative) emission scales; the smeared approach improves discrimination as $\epsilon$ decreases and provides a physics-grounded path toward interpretability, albeit with substantial compute costs. The work suggests extensions to other discriminations and a renormalization-group-like flow of smeared observables, highlighting both the potential impact and computational challenges of this interpretability paradigm.

Abstract

The problem of interpretability of machine learning architecture in particle physics has no agreed-upon definition, much less any proposed solution. We present a first modest step toward these goals by proposing a definition and corresponding practical method for isolation and identification of relevant physical energy scales exploited by the machine. This is accomplished by smearing or averaging over all input events that lie within a prescribed metric energy distance of one another and correspondingly renders any quantity measured on a finite, discrete dataset continuous over the dataspace. Within this approach, we are able to explicitly demonstrate that (approximate) scaling laws are a consequence of extreme value theory applied to analysis of the distribution of the irreducible minimal distance over which a machine must extrapolate given a finite dataset. As an example, we study quark versus gluon jet identification, construct the smeared likelihood, and show that discrimination power steadily increases as resolution decreases, indicating that the true likelihood for the problem is sensitive to emissions at all scales.

A Step Toward Interpretability: Smearing the Likelihood

TL;DR

The paper defines interpretability in particle-physics ML as isolating the physical energy scales that a model exploits, implemented by smearing data with a metric distance of energy units and analyzing the smeared likelihood . It adopts the Spectral Energy Mover's Distance as the IRC-safe metric, constructing smeared distributions and studying discrimination power for quark–gluon jets as a function of the resolution . Through empirical observations and extreme value theory, it shows that the minimal resolvable scale shrinks with dataset size following a power-law with a slowly varying exponent, implying that practical models must extrapolate to perturbative (and potentially nonperturbative) emission scales; the smeared approach improves discrimination as decreases and provides a physics-grounded path toward interpretability, albeit with substantial compute costs. The work suggests extensions to other discriminations and a renormalization-group-like flow of smeared observables, highlighting both the potential impact and computational challenges of this interpretability paradigm.

Abstract

The problem of interpretability of machine learning architecture in particle physics has no agreed-upon definition, much less any proposed solution. We present a first modest step toward these goals by proposing a definition and corresponding practical method for isolation and identification of relevant physical energy scales exploited by the machine. This is accomplished by smearing or averaging over all input events that lie within a prescribed metric energy distance of one another and correspondingly renders any quantity measured on a finite, discrete dataset continuous over the dataspace. Within this approach, we are able to explicitly demonstrate that (approximate) scaling laws are a consequence of extreme value theory applied to analysis of the distribution of the irreducible minimal distance over which a machine must extrapolate given a finite dataset. As an example, we study quark versus gluon jet identification, construct the smeared likelihood, and show that discrimination power steadily increases as resolution decreases, indicating that the true likelihood for the problem is sensitive to emissions at all scales.
Paper Structure (11 sections, 37 equations, 3 figures)

This paper contains 11 sections, 37 equations, 3 figures.

Figures (3)

  • Figure 1: Left: distribution of the minimal distances in GeV between gluon jets and any quark jets (blue), and between quark jets and any gluon jets (orange), on the 20000+20000 jet dataset. Right: Plot of the relationship between the mean minimum distance $\langle\epsilon_{\min}\rangle$ in GeV on gluon-to-quark (blue) and quark-to-gluon (orange) jet distances, as a function of number of events $n$ in the dataset. Points are displaced $\pm 5\%$ from the true number of events for visibility, and the vertical line represents $\pm 1$ standard deviation about the means. The power law fit of $\langle \epsilon_\text{min}\rangle = 19.4\,n^{-0.12}$ GeV is shown in dashed-black.
  • Figure 2: Distributions of the smeared likelihood score on 20000 events each of gluon jets (blue) and quark jets (orange). From top left to bottom, the smearing distance $\epsilon$ is varied on $30,25,20,15,10$ GeV.
  • Figure 3: ROC curves of quark jet efficiency as a function of gluon jet efficiency from a sliding cut on the smeared likelihood. The smearing distance is varied from $\epsilon = 30,25,20,15,10$ GeV from top curve to bottom. Better discrimination performance is the lower right direction of this plot.