Table of Contents
Fetching ...

Energy Flow Networks: Deep Sets for Particle Jets

Patrick T. Komiske, Eric M. Metodiev, Jesse Thaler

TL;DR

This paper addresses learning from collider events treated as variable-length, unordered sets of particles by introducing Energy Flow Networks (EFN) and Particle Flow Networks (PFN) based on the Deep Sets paradigm. It formalizes observables as $\mathcal{O}(\{p_i\}) = F\Big(\sum_i \Phi(p_i)\Big)$ and extends this to IRC-safe observables via $\mathcal{O}(\{p_i\}) = F\Big(\sum_i z_i\,\Phi(\hat{p}_i)\Big)$, unifying detector images and radiation moments within a single framework. The authors demonstrate competitive quark/gluon jet discrimination using EFN/PFN, reveal interpretable latent-space visualizations that reflect QCD’s collinear structure, and extract closed-form observables (e.g., $A_{r_0}$, $B_{r_1,\beta}$, and $C(A,B)$) from trained models, bridging learned representations and analytic physics. These methods offer scalable, permutation-invariant tools for a broad range of LHC analyses, with the potential for extensions to pileup mitigation and event-level learning. Overall, the work provides a principled, interpretable approach to set-based learning in high-energy physics that preserves theoretical properties while enabling practical gains in performance and insight.

Abstract

A key question for machine learning approaches in particle physics is how to best represent and learn from collider events. As an event is intrinsically a variable-length unordered set of particles, we build upon recent machine learning efforts to learn directly from sets of features or "point clouds". Adapting and specializing the "Deep Sets" framework to particle physics, we introduce Energy Flow Networks, which respect infrared and collinear safety by construction. We also develop Particle Flow Networks, which allow for general energy dependence and the inclusion of additional particle-level information such as charge and flavor. These networks feature a per-particle internal (latent) representation, and summing over all particles yields an overall event-level latent representation. We show how this latent space decomposition unifies existing event representations based on detector images and radiation moments. To demonstrate the power and simplicity of this set-based approach, we apply these networks to the collider task of discriminating quark jets from gluon jets, finding similar or improved performance compared to existing methods. We also show how the learned event representation can be directly visualized, providing insight into the inner workings of the model. These architectures lend themselves to efficiently processing and analyzing events for a wide variety of tasks at the Large Hadron Collider. Implementations and examples of our architectures are available online in our EnergyFlow package.

Energy Flow Networks: Deep Sets for Particle Jets

TL;DR

This paper addresses learning from collider events treated as variable-length, unordered sets of particles by introducing Energy Flow Networks (EFN) and Particle Flow Networks (PFN) based on the Deep Sets paradigm. It formalizes observables as and extends this to IRC-safe observables via , unifying detector images and radiation moments within a single framework. The authors demonstrate competitive quark/gluon jet discrimination using EFN/PFN, reveal interpretable latent-space visualizations that reflect QCD’s collinear structure, and extract closed-form observables (e.g., , , and ) from trained models, bridging learned representations and analytic physics. These methods offer scalable, permutation-invariant tools for a broad range of LHC analyses, with the potential for extensions to pileup mitigation and event-level learning. Overall, the work provides a principled, interpretable approach to set-based learning in high-energy physics that preserves theoretical properties while enabling practical gains in performance and insight.

Abstract

A key question for machine learning approaches in particle physics is how to best represent and learn from collider events. As an event is intrinsically a variable-length unordered set of particles, we build upon recent machine learning efforts to learn directly from sets of features or "point clouds". Adapting and specializing the "Deep Sets" framework to particle physics, we introduce Energy Flow Networks, which respect infrared and collinear safety by construction. We also develop Particle Flow Networks, which allow for general energy dependence and the inclusion of additional particle-level information such as charge and flavor. These networks feature a per-particle internal (latent) representation, and summing over all particles yields an overall event-level latent representation. We show how this latent space decomposition unifies existing event representations based on detector images and radiation moments. To demonstrate the power and simplicity of this set-based approach, we apply these networks to the collider task of discriminating quark jets from gluon jets, finding similar or improved performance compared to existing methods. We also show how the learned event representation can be directly visualized, providing insight into the inner workings of the model. These architectures lend themselves to efficiently processing and analyzing events for a wide variety of tasks at the Large Hadron Collider. Implementations and examples of our architectures are available online in our EnergyFlow package.

Paper Structure

This paper contains 18 sections, 19 equations, 21 figures, 4 tables.

Figures (21)

  • Figure 1: A visualization of the decomposition of an observable via Eq. (\ref{['eq:obdecomp']}). Each particle in the event is mapped by $\Phi$ to an internal (latent) particle representation, shown here as three abstract illustrations for a latent space of dimension three. The latent representation is then summed over all particles to arrive at a latent event representation, which is mapped by $F$ to the value of the observable. For the IRC-safe case of Eq. (\ref{['eq:ircsafeobdecomp']}), $\Phi$ takes in the angular information of the particle and the sum is weighted by the particle energies or transverse momenta.
  • Figure 2: The calorimeter image representation decomposed into a collection of $\Phi(y,\phi)$ filters according to the IRC-safe Observable Decomposition, shown here for the illustrative case of a $4\times4$ image. The energy deposits in each pixel can be decomposed via Eq. (\ref{['eq:ircsafeobdecomp']}) into an indicator function $\Phi(y,\phi)$ determining whether a particle in position $(y,\phi)$ hits the pixel.
  • Figure 3: The radiation moment representation decomposed into a collection of $\Phi(y,\phi)$ filters according to the IRC-safe Observable Decomposition. The $(m,n)$ moment of the energy distribution in the rapidity-azimuth plane can be decomposed via Eq. (\ref{['eq:ircsafeobdecomp']}) into $\Phi(y,\phi)=y^m\phi^n$, shown here with increasing $m$ downward and increasing $n$ to the right.
  • Figure 4: The particular dense networks used here to parametrize (a) the per-particle mapping $\Phi$ and (b) the function $F$, shown for the case of a latent space of dimension $\ell = 8$. For the EFN, the latent observable is $\mathcal{O}_a = \sum_i z_i \, \Phi_a(y_i, \phi_i)$. For the PFN family, the latent observable is $\mathcal{O}_a = \sum_i \, \Phi_a(y_i, \phi_i, z_i, \textsc{pid}_i)$, with different levels of particle-ID (PID) information. The output of $F$ is a softmaxed signal ($S$) versus background ($B$) discriminant.
  • Figure 5: The AUC performance of the EFN and PFN models as a function of the latent dimension of the model, which is varied from 2 to 256 in powers of 2. The spread in values is due to training the model ten times with different initializations. The performance generally increases with larger latent dimensions, with saturation observed by latent dimension 256. The best model is PFN-ID, which uses full particle-type information, followed closely by PFN-Ex, which uses experimentally realistic particle-type information. The PFN without any extra information performs roughly the same as the PFN-Ch, which uses charge information. The fact that the EFN is lowest on this plot indicates that there is discrimination power to be found in IRC-unsafe information.
  • ...and 16 more figures