Table of Contents
Fetching ...

Angular Coefficients from Interpretable Machine Learning with Symbolic Regression

Josh Bendavid, Daniel Conde, Manuel Morales-Alvarado, Veronica Sanz, Maria Ubiali

TL;DR

This work addresses the challenge of obtaining analytic, interpretable expressions for the angular coefficients $A_i$ governing electroweak boson decays at the LHC. By applying symbolic regression via PySR to MC-generated data, the authors derive compact closed-form expressions for $A_i$ as functions of $p_T$, $y$, and $m$, validated across 1D, 2D, and 3D kinematic spaces. The results show that SR can reproduce MC predictions within uncertainties, maintain key symmetries such as Lam-Tung, and provide interpretable surrogates that are useful for rapid fits and theory–data comparisons. This approach offers fast, transparent parametrisations of angular observables and sets the stage for extensions to higher orders and direct data applications in precision electroweak studies.

Abstract

We explore the use of symbolic regression to derive compact analytical expressions for angular observables relevant to electroweak boson production at the Large Hadron Collider (LHC). Focusing on the angular coefficients that govern the decay distributions of $W$ and $Z$ bosons, we investigate whether symbolic models can well approximate these quantities, typically computed via computationally costly numerical procedures, with high fidelity and interpretability. Using the PySR package, we first validate the approach in controlled settings, namely in angular distributions in lepton-lepton collisions in QED and in leading-order Drell-Yan production at the LHC. We then apply symbolic regression to extract closed-form expressions for the angular coefficients $A_i$ as functions of transverse momentum, rapidity, and invariant mass, using next-to-leading order simulations of $pp \to \ell^+\ell^-$ events. Our results demonstrate that symbolic regression can produce accurate and generalisable expressions that match Monte Carlo predictions within uncertainties, while preserving interpretability and providing insight into the kinematic dependence of angular observables.

Angular Coefficients from Interpretable Machine Learning with Symbolic Regression

TL;DR

This work addresses the challenge of obtaining analytic, interpretable expressions for the angular coefficients governing electroweak boson decays at the LHC. By applying symbolic regression via PySR to MC-generated data, the authors derive compact closed-form expressions for as functions of , , and , validated across 1D, 2D, and 3D kinematic spaces. The results show that SR can reproduce MC predictions within uncertainties, maintain key symmetries such as Lam-Tung, and provide interpretable surrogates that are useful for rapid fits and theory–data comparisons. This approach offers fast, transparent parametrisations of angular observables and sets the stage for extensions to higher orders and direct data applications in precision electroweak studies.

Abstract

We explore the use of symbolic regression to derive compact analytical expressions for angular observables relevant to electroweak boson production at the Large Hadron Collider (LHC). Focusing on the angular coefficients that govern the decay distributions of and bosons, we investigate whether symbolic models can well approximate these quantities, typically computed via computationally costly numerical procedures, with high fidelity and interpretability. Using the PySR package, we first validate the approach in controlled settings, namely in angular distributions in lepton-lepton collisions in QED and in leading-order Drell-Yan production at the LHC. We then apply symbolic regression to extract closed-form expressions for the angular coefficients as functions of transverse momentum, rapidity, and invariant mass, using next-to-leading order simulations of events. Our results demonstrate that symbolic regression can produce accurate and generalisable expressions that match Monte Carlo predictions within uncertainties, while preserving interpretability and providing insight into the kinematic dependence of angular observables.

Paper Structure

This paper contains 20 sections, 24 equations, 24 figures, 20 tables.

Figures (24)

  • Figure 1: Angular distribution for 30 bins in $\cos\theta$.
  • Figure 2: Angular distribution for 30 bins normalised to the analytical equation.
  • Figure 3: Relative size of normalisation coefficients for different number of bins.
  • Figure 4: Parton luminosity values obtained from the reweighted simulation of the LO Drell-Yan double-differential cross section Eq. \ref{['eq:l2_pdf']}.
  • Figure 5: Parton luminosity values obtained with the symbolic regression model corresponding to a complexity of 33, see Table \ref{['tab:selected_equations']}.
  • ...and 19 more figures