A Step Toward Interpretability: Smearing the Likelihood
Andrew J. Larkoski
TL;DR
The paper defines interpretability in particle-physics ML as isolating the physical energy scales that a model exploits, implemented by smearing data with a metric distance $d(\cdot,\cdot)$ of energy units and analyzing the smeared likelihood $\mathcal{L}(\vec{x}|\epsilon)$. It adopts the $p=2$ Spectral Energy Mover's Distance as the IRC-safe metric, constructing smeared distributions and studying discrimination power for quark–gluon jets as a function of the resolution $\epsilon$. Through empirical observations and extreme value theory, it shows that the minimal resolvable scale shrinks with dataset size following a power-law with a slowly varying exponent, implying that practical models must extrapolate to perturbative (and potentially nonperturbative) emission scales; the smeared approach improves discrimination as $\epsilon$ decreases and provides a physics-grounded path toward interpretability, albeit with substantial compute costs. The work suggests extensions to other discriminations and a renormalization-group-like flow of smeared observables, highlighting both the potential impact and computational challenges of this interpretability paradigm.
Abstract
The problem of interpretability of machine learning architecture in particle physics has no agreed-upon definition, much less any proposed solution. We present a first modest step toward these goals by proposing a definition and corresponding practical method for isolation and identification of relevant physical energy scales exploited by the machine. This is accomplished by smearing or averaging over all input events that lie within a prescribed metric energy distance of one another and correspondingly renders any quantity measured on a finite, discrete dataset continuous over the dataspace. Within this approach, we are able to explicitly demonstrate that (approximate) scaling laws are a consequence of extreme value theory applied to analysis of the distribution of the irreducible minimal distance over which a machine must extrapolate given a finite dataset. As an example, we study quark versus gluon jet identification, construct the smeared likelihood, and show that discrimination power steadily increases as resolution decreases, indicating that the true likelihood for the problem is sensitive to emissions at all scales.
