Table of Contents
Fetching ...

Learning Selection Cuts With Gradients

Mike Hance, Juan Robles

TL;DR

The paper addresses the challenge of optimizing multi-feature selection cuts in high-energy physics by learning cuts as biases in a minimal neural network (CABIN) and performing gradient-based optimization with differentiable cuts. It introduces a differentiable cut formulation using logistic activations and a product-based score, along with loss functions that target a specified signal efficiency and regulate cut evolution through contractive terms. Through comparisons with TMVA kCuts, a simple neural network, and a xgBoost BDT on SUSY slepton discrimination data, CABIN achieves competitive performance and yields smooth, robust cuts across efficiency targets. The work highlights the practical utility of fully differentiable cut optimization and outlines extensions toward significance metrics and end-to-end differentiable analysis chains.

Abstract

Many analyses in high-energy physics rely on selection thresholds (cuts) applied to detector, particle, or event properties. Initial cut values can often be guessed from physical intuition, but cut optimization, especially for multiple features, is commonly performed by hand, or skipped entirely in favor of multivariate algorithms like BDTs or neural networks. We revisit this problem, and develop a cut optimization approach based on gradient descent. Cut thresholds are learned as parameters of a network with a simple architecture, and can be tuned to achieve a target signal efficiency through the use of custom loss functions. Contractive terms in the loss can be used to ensure a smooth evolution of cuts as functions of efficiency, particle kinematics, or event features. The method is used to classify events in a search for Supersymmetry, and the performance is compared with common classification tools. An implementation of this approach is available in a public code repository and python package.

Learning Selection Cuts With Gradients

TL;DR

The paper addresses the challenge of optimizing multi-feature selection cuts in high-energy physics by learning cuts as biases in a minimal neural network (CABIN) and performing gradient-based optimization with differentiable cuts. It introduces a differentiable cut formulation using logistic activations and a product-based score, along with loss functions that target a specified signal efficiency and regulate cut evolution through contractive terms. Through comparisons with TMVA kCuts, a simple neural network, and a xgBoost BDT on SUSY slepton discrimination data, CABIN achieves competitive performance and yields smooth, robust cuts across efficiency targets. The work highlights the practical utility of fully differentiable cut optimization and outlines extensions toward significance metrics and end-to-end differentiable analysis chains.

Abstract

Many analyses in high-energy physics rely on selection thresholds (cuts) applied to detector, particle, or event properties. Initial cut values can often be guessed from physical intuition, but cut optimization, especially for multiple features, is commonly performed by hand, or skipped entirely in favor of multivariate algorithms like BDTs or neural networks. We revisit this problem, and develop a cut optimization approach based on gradient descent. Cut thresholds are learned as parameters of a network with a simple architecture, and can be tuned to achieve a target signal efficiency through the use of custom loss functions. Contractive terms in the loss can be used to ensure a smooth evolution of cuts as functions of efficiency, particle kinematics, or event features. The method is used to classify events in a search for Supersymmetry, and the performance is compared with common classification tools. An implementation of this approach is available in a public code repository and python package.

Paper Structure

This paper contains 13 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The logistic function $\sigma(x)$ can be used to approximate the Heaviside step function $\Theta(x)$ by scaling the inputs. This provides a differentiable proxy for rectangular cuts.
  • Figure 2: Feynman diagrams for SUSY slepton production (left) and background from SM diboson production (right).
  • Figure 3: Distributions of kinematic features used to classify SUSY and SM $WW$ events.
  • Figure 4: A one-to-one linear network used to biases ($b_k$) that correspond to cuts. The sign of the weight ($w_k$) determines whether the bias term corresponds to a "greater-than" or "less-than" cut; for simplicity we usually fix them to $\pm 1$.
  • Figure 5: A simple linear network, with trainable weights and no bias terms.
  • ...and 3 more figures