Table of Contents
Fetching ...

Precision calibration of calorimeter signals in the ATLAS experiment using an uncertainty-aware neural network

ATLAS Collaboration

TL;DR

The paper addresses precise, per-cluster calibration of ATLAS calorimeter signals using an uncertainty-aware Bayesian neural network (BNN). By learning the EM-scale cluster response $\mathcal{R}_{clus}^{EM}=E_{clus}^{EM}/E_{dep}$ as a function of 15 topo-cluster features, the approach yields a smooth, multi-dimensional calibration that improves linearity and local energy resolution compared to local hadronic calibration (LCW) and a deep neural network baseline. Crucially, it provides predictive uncertainties decomposed into systematic and statistical components, validated against an alternative Repulsive Ensemble (RE) estimator, showing consistent, conservative uncertainty estimates. The results demonstrate notable gains in low-energy regions and regions affected by detector transitions, with potential for data-quality selections and uncertainty-informed analyses, while highlighting areas for further validation on real data and at higher pile-up. The methodology offers a scalable path toward uncertainty-aware ML calibrations in complex calorimeter systems and broader high-energy physics applications.

Abstract

The ATLAS experiment at the Large Hadron Collider explores the use of modern neural networks for a multi-dimensional calibration of its calorimeter signal defined by clusters of topologically connected cells (topo-clusters). The Bayesian neural network (BNN) approach not only yields a continuous and smooth calibration function that improves performance relative to the standard calibration but also provides uncertainties on the calibrated energies for each topo-cluster. The results obtained by using a trained BNN are compared to the standard local hadronic calibration and to a calibration provided by training a deep neural network. The uncertainties predicted by the BNN are interpreted in the context of a fractional contribution to the systematic uncertainties of the trained calibration. They are also compared to uncertainty predictions obtained from an alternative estimator employing repulsive ensembles.

Precision calibration of calorimeter signals in the ATLAS experiment using an uncertainty-aware neural network

TL;DR

The paper addresses precise, per-cluster calibration of ATLAS calorimeter signals using an uncertainty-aware Bayesian neural network (BNN). By learning the EM-scale cluster response as a function of 15 topo-cluster features, the approach yields a smooth, multi-dimensional calibration that improves linearity and local energy resolution compared to local hadronic calibration (LCW) and a deep neural network baseline. Crucially, it provides predictive uncertainties decomposed into systematic and statistical components, validated against an alternative Repulsive Ensemble (RE) estimator, showing consistent, conservative uncertainty estimates. The results demonstrate notable gains in low-energy regions and regions affected by detector transitions, with potential for data-quality selections and uncertainty-informed analyses, while highlighting areas for further validation on real data and at higher pile-up. The methodology offers a scalable path toward uncertainty-aware ML calibrations in complex calorimeter systems and broader high-energy physics applications.

Abstract

The ATLAS experiment at the Large Hadron Collider explores the use of modern neural networks for a multi-dimensional calibration of its calorimeter signal defined by clusters of topologically connected cells (topo-clusters). The Bayesian neural network (BNN) approach not only yields a continuous and smooth calibration function that improves performance relative to the standard calibration but also provides uncertainties on the calibrated energies for each topo-cluster. The results obtained by using a trained BNN are compared to the standard local hadronic calibration and to a calibration provided by training a deep neural network. The uncertainties predicted by the BNN are interpreted in the context of a fractional contribution to the systematic uncertainties of the trained calibration. They are also compared to uncertainty predictions obtained from an alternative estimator employing repulsive ensembles.

Paper Structure

This paper contains 53 sections, 53 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: The distributions of the energy $E_{\text{clus}\xspace}^{\text{dep}\xspace}$ deposited in topo-clusters are shown in \ref{['fig:composition:deposit']} in a stack, for the clusters categorised as electromagnetic, hadronic or composite by applying the classifications introduced in Section \ref{['sec:detdat:clusters:truth']}. In \ref{['fig:composition:signal']}, the corresponding stacked distributions for the basic cluster signal $E_{\text{clus}\xspace}^{\text{EM}\xspace}$ at EM scale are shown. The distributions of the machine-learning training target, the topo-cluster response $\mathcal{R}\xspace_{\text{clus}}^{\text{EM}}$ constructed from $E_{\text{clus}\xspace}^{\text{EM}\xspace}$ and $E_{\text{clus}\xspace}^{\text{dep}\xspace}$ according to Eq. \ref{['eq:target-response']}, are shown stacked in \ref{['fig:composition:response']} for the same three categories. The distributions are filled from the sample of topo-clusters collected from fully simulated calorimeter jets, as described in Section \ref{['sec:dataset:object']}. The respective lower panels show the relative contributions from the three categories to the inclusive spectrum. The shaded areas indicated the statistical errors.
  • Figure 2: The principal design of the Bayesian neural network (BNN) employed for the regression fits and the corresponding uncertainty predictions for the topo-cluster calibration. The network on the left is designed such that the resulting weights linking the nodes of adjacent layers are described by weight functions $q(\theta)$ with learned averages and widths. The inference of the trained model, represented by a set of learned network parameters $\theta$, is shown on the right, where an ensemble of $N$ networks is sampled from $q(\theta)$ to generate $N$ predictions from the model. The central prediction $\mathcal{R}\xspace_{\text{clus}}^{\text{BNN}}\xspace$ is the mode (most probable value) of the average of $N$ probability density functions $p(\mathcal{R}|\theta_{s},\mathcal{X}_{\text{clus}\xspace}\xspace)$, where each individual $p(\mathcal{R}|\theta_{s},\mathcal{X}_{\text{clus}\xspace}\xspace)$ represents the probability that the network describes the target $\mathcal{R} = \mathcal{R}\xspace_{\text{clus}}^{\text{EM}}\xspace$ for a given topo-cluster with feature set $\mathcal{X}_{\text{clus}\xspace}$. Each of the $p(\mathcal{R}|\theta_{s},\mathcal{X}_{\text{clus}\xspace}\xspace)$ is defined as the mixture of $i = 1 \ldots N_{\text{mix}}\xspace$ Gaussian distributions with learned means $\langle \mathcal{R}\rangle_{\theta_{s},i}$, widths $\sigma_{\theta_{s},i}$ and coefficients $\alpha_{\theta_{s},i}$, as introduced in Eq. \ref{['eq:bnn_gmix']}, with $N_{\text{mix}}\xspace = 3$. The averaged function is the normalised sum of all probability density functions obtained by sampling weights from $q(\theta)$$N$ times, with $N=50$. It is composed of a total of $N\times N_{\text{mix}}\xspace$ Gaussian distributions. The weighted average response $\langle \mathcal{R}\rangle_{\theta_{s}}$ is the sum of $\alpha_{\theta_{s},i}\langle \mathcal{R}\rangle_{\theta_{s},i}$ from the $N_{\text{mix}}$ distributions for each sample $s$. It is needed to predict the systematic ($\sigma_{\text{syst}\xspace}^{\text{BNN}}\xspace(\mathcal{X}_{\text{clus}\xspace}\xspace)$) and statistical ($\sigma_{\text{stat}\xspace}^{\text{BNN}}\xspace(\mathcal{X}_{\text{clus}\xspace}\xspace)$) uncertainties. It is calculated for each of the $N$ sampled sets of network parameters $\theta_{s}$ to obtain $\langle \mathcal{R}\rangle$, the mean response averaged over the $N$ corresponding $\langle \mathcal{R}\rangle_{\theta_{s}}$. This mean is needed exclusively for the calculation of $\sigma_{\text{stat}\xspace}^{\text{BNN}}$. All calculations are individually performed for any given topo-cluster at inference of the trained BNN model. The numbers written above some of the edges (links) between nodes are weights sampled from $q(\theta)$ at network inference. They are for illustration only and show the sampling character of the predictions.
  • Figure 3: Schematics describing the design of the repulsive ensemble (RE) model employed for the regression fits and the corresponding uncertainty predictions of the topo-cluster calibration. An ensemble of $N$ networks with identical configurations is configured. These $N$ networks are trained simultaneously with interconnections between their loss functions acting as repulsive forces between them (shown as spring connectors in the schematic). The predictions from the $N$ networks are collected for each topo-cluster in each processed batch during training. A central prediction and the uncertainties are calculated in the same way as illustrated in Figure \ref{['fig:bnn_sketch']} for the BNN, with the same nomenclature introduced there. The numbers written above some of the edges (links) between nodes illustrate the learned weights for one of the $N$ repulsive ensembles. They are shown to illustrate that each network fits its own weights within the repulsive action of the loss function.
  • Figure 4: The topo-cluster response $\mathcal{R}\xspace_{\text{clus}}$ is evaluated cluster-by-cluster as a function of the energy $E_{\text{clus}\xspace}^{\text{dep}\xspace}$ deposited in the cluster. In \ref{['fig:predpower:rems:edep']}, the distribution of the response at EM scale $\mathcal{R}\xspace_{\text{clus}}^{\text{EM}}$ (the training target) is shown, while \ref{['fig:predpower:rbnn:edep']} shows the distribution of the response $\mathcal{R}\xspace_{\text{clus}}^{\text{BNN}}$ predicted by the trained BNN. The corresponding distribution of the response $\mathcal{R}\xspace_{\text{clus}}^{\text{RE}}$ predicted by the RE is shown in \ref{['fig:predpower:rrde:edep']}. The topo-clusters are extracted from MC simulations, as described in Section \ref{['sec:dataset:object']}.
  • Figure 5: The distributions of the topo-cluster response $\mathcal{R}\xspace_{\text{clus}}^{\text{EM}}$ and the corresponding predictions from the BNN ($\mathcal{R}\xspace_{\text{clus}}^{\text{BNN}}$) and the RE ($\mathcal{R}\xspace_{\text{clus}}^{\text{RE}}$) are evaluated cluster-by-cluster as a function of selected features. The distributions of the training target $\mathcal{R}\xspace_{\text{clus}}^{\text{EM}}$ are shown in \ref{['fig:predpower:rems:eems']} as a function of the cluster signal $E_{\text{clus}\xspace}^{\text{EM}\xspace}$, in \ref{['fig:predpower:rems:mfem']} as a function of the cluster signal fraction $f_{\text{emc}}$ in the electromagnetic calorimeter, and in \ref{['fig:predpower:rems:mcog']} as a function of the distance of the topo-cluster centre-of-gravity $|\vec{c}_{\text{clus}\xspace}\xspace|$. The distributions of the predicted responses $\mathcal{R}\xspace_{\text{clus}}^{\text{BNN}}$ and $\mathcal{R}\xspace_{\text{clus}}^{\text{RE}}$ as functions of the same features are respectively shown in \ref{['fig:predpower:rbnn:eems']}, \ref{['fig:predpower:rrde:eems']} for $E_{\text{clus}\xspace}^{\text{EM}\xspace}$, in \ref{['fig:predpower:rbnn:mfem']}, \ref{['fig:predpower:rrde:mfem']} for $f_{\text{emc}}$, and in \ref{['fig:predpower:rbnn:mcog']}, \ref{['fig:predpower:rrde:mcog']} for $|\vec{c}_{\text{clus}\xspace}\xspace|$. The topo-clusters are extracted from MC simulations, as described in Section \ref{['sec:dataset:object']}.
  • ...and 14 more figures