Table of Contents
Fetching ...

Maximizing Boosted Top Identification by Minimizing N-subjettiness

Jesse Thaler, Ken Van Tilburg

TL;DR

The paper advances jet substructure by minimizing $N$-subjettiness over all subjet directions using a k-means–like algorithm, introducing an angular weighting exponent $\beta$ and an efficient minimization procedure. With BOOST2010 top-quark benchmarks, it demonstrates that the minimized $\tau_N$ approach, especially $\beta=1$ with the $\tau^{(1)}_3/\tau^{(1)}_2$ ratio, yields strong boosted-top tagging and robust performance across jet $p_T$. It also shows that multivariate discriminants combining $\tau_N$, their ratios, and jet mass can further enhance discrimination, and it outlines extending the minimization concept to event-level $N$-jettiness to define a fixed-$N$ cone jet algorithm. Overall, the work provides a rigorously defined, infrared-safe, and versatile framework for boosted object tagging and jet finding.

Abstract

N-subjettiness is a jet shape designed to identify boosted hadronic objects such as top quarks. Given N subjet axes within a jet, N-subjettiness sums the angular distances of jet constituents to their nearest subjet axis. Here, we generalize and improve on N-subjettiness by minimizing over all possible subjet directions, using a new variant of the k-means clustering algorithm. On boosted top benchmark samples from the BOOST2010 workshop, we demonstrate that a simple cut on the 3-subjettiness to 2-subjettiness ratio yields 20% (50%) tagging efficiency for a 0.23% (4.1%) fake rate, making N-subjettiness a highly effective boosted top tagger. N-subjettiness can be modified by adjusting an angular weighting exponent, and we find that the jet broadening measure is preferred for boosted top searches. We also explore multivariate techniques, and show that additional improvements are possible using a modified Fisher discriminant. Finally, we briefly mention how our minimization procedure can be extended to the entire event, allowing the event shape N-jettiness to act as a fixed N cone jet algorithm.

Maximizing Boosted Top Identification by Minimizing N-subjettiness

TL;DR

The paper advances jet substructure by minimizing -subjettiness over all subjet directions using a k-means–like algorithm, introducing an angular weighting exponent and an efficient minimization procedure. With BOOST2010 top-quark benchmarks, it demonstrates that the minimized approach, especially with the ratio, yields strong boosted-top tagging and robust performance across jet . It also shows that multivariate discriminants combining , their ratios, and jet mass can further enhance discrimination, and it outlines extending the minimization concept to event-level -jettiness to define a fixed- cone jet algorithm. Overall, the work provides a rigorously defined, infrared-safe, and versatile framework for boosted object tagging and jet finding.

Abstract

N-subjettiness is a jet shape designed to identify boosted hadronic objects such as top quarks. Given N subjet axes within a jet, N-subjettiness sums the angular distances of jet constituents to their nearest subjet axis. Here, we generalize and improve on N-subjettiness by minimizing over all possible subjet directions, using a new variant of the k-means clustering algorithm. On boosted top benchmark samples from the BOOST2010 workshop, we demonstrate that a simple cut on the 3-subjettiness to 2-subjettiness ratio yields 20% (50%) tagging efficiency for a 0.23% (4.1%) fake rate, making N-subjettiness a highly effective boosted top tagger. N-subjettiness can be modified by adjusting an angular weighting exponent, and we find that the jet broadening measure is preferred for boosted top searches. We also explore multivariate techniques, and show that additional improvements are possible using a modified Fisher discriminant. Finally, we briefly mention how our minimization procedure can be extended to the entire event, allowing the event shape N-jettiness to act as a fixed N cone jet algorithm.

Paper Structure

This paper contains 15 sections, 13 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Comparison of $N$-subjettiness to other boosted top taggers using benchmark samples from the BOOST2010 report Abdesselam:2010pt. These efficiency/mistag curves are taken from Ref. Abdesselam:2010pt and then overlayed with our results from Fig. \ref{['fig:TopSigEff']} (for a one-dimensional $\tau_3/\tau_2$ cut) and Fig. \ref{['fig:FisherEfficiency']} (for a multivariate $\tau_N$ method). Details about these curves are given in Sec. \ref{['sec:topTaggingPerformance']}, and we will use a different range for the vertical axis in subsequent figures to highlight the small mistag rate region. Except for the very high efficiency region, $N$-subjettiness outperforms previous top tagging methods.
  • Figure 2: Top row: Event displays for a typical top jet with invariant mass near $m_{\rm top}$. In (a), the orange square, circles, and crosses indicate the axes that minimize $\tilde{\tau}_1$, $\tilde{\tau}_2$, and $\tilde{\tau}_3$, respectively, for $\beta = 1$ ("linear" minimization). The dashed orange line indicates the edge of the two Voronoi regions for the axes minimizing $\tilde{\tau}^{(1)}_2$, and the solid orange lines indicate the Voronoi edges for the axes minimizing $\tilde{\tau}^{(1)}_3$. In (b), we show the same top jet with equivalent information for $\beta = 2$ and the "quadratic minimization" in black, and in (c) for $\beta = 1$ and the axes found by the exclusive $k_T$ algorithm in gray. In this and subsequent event displays, the particles are clustered into virtual calorimeter cells of size 0.1 by 0.1, and the marker area for each cell is proportional its scalar transverse momentum. Bottom row: similar diagrams for a fat QCD jet with mass near $m_{\rm top}$.
  • Figure 3: Convergence of the minimization algorithm for notable values of $\beta$ on a one-dimensional two-particle configuration. One particle is located at $y = 0$ with $p_T = e^{- \delta}$, the other at $y=1$ with $p_T = e^{+ \delta}$, and the global minimum of $\tilde{\tau}_1^{(\beta)}$ is located at $y_0 = \frac{1}{2}(1 + \tanh \frac{\delta}{\beta - 1})$. The algorithm is initialized at $y_0^{(0)} = 0.1$, which is closer to the softer particle. Convergence to the global minimum of $\tilde{\tau}_1^{(\beta)}$ is reached for $1 \le \beta < 3$. The algorithm can converge to a non-global minimum for $\beta < 1$ if the initial axis is chosen too close to the softer particle (here shown by $\beta = 0.9$), and the algorithm diverges for $\beta \ge 3$ (here shown for the critical case $\beta = 3$). For $\beta = 2$, the algorithm finds the global minimum in one step, as expected from Lloyd's algorithm.
  • Figure 4: Convergence path of the minimization algorithm for $N=3$ and $\beta = 1$. Shown is the same top jet as in Fig. \ref{['fig:eventDisplays']}. Panels (a), (b) and (c) show three different initial seedings for our modified $k$-means clustering procedure. The open circle is the seed position, the dots are the updated positions, and a line connecting them is drawn to guide the eye. The first two seeds find the correct global minimum in a small number of steps, while the third seed gets trapped at a local minimum.
  • Figure 5: Difference between the minimum value of $\tau_N$ and the exclusive $k_T$$\tilde{\tau}_N$. The event sample is the 500-600 GeV $t\bar{t}$ sample detailed in Sec. \ref{['sec:analysisOverview']}, with the same event selection as Fig. \ref{['fig:tauHistograms2Axes']}. The top row is $\beta = 1$, the bottom row is $\beta = 2$, and the columns are $\tau_1^{(\beta)}$, $\tau_2^{(\beta)}$, and $\tau_3^{(\beta)}$. For $\beta = 1$, the difference between the minimum $\tau_N$ and the exclusive $k_T$$\tilde{\tau}_N$ can be of order 50%, though this difference is ameliorated by doing a single pass of the minimization procedure using the exclusive $k_T$ axes as a seed. For $\beta = 2$, the values of $N$-subjettiness are typically different by less than 10%, except for rare cases where the exclusive $k_T$ axes are near a local minimum of $\tilde{\tau}_N$, such that even doing a single pass of the minimization procedure does not help much.
  • ...and 10 more figures