Table of Contents
Fetching ...

Boosted decision tree reweighting of simulated neutrino interactions for $O(1)$ GeV neutrino cross-section measurements

Z. Lin, S. Akhter, Z. Ahmad Dar, N. S. Alex, M. Betancourt, S. Boyd, H. Budd, G. Caceres, G. A. Díaz, J. Felix, L. Fields, A. M. Gago, P. K. Gaur, S. M. Gilligan, R. Gran, D. A. Harris, A. L. Hart, J. Kleykamp, A. Klustová, D. Last, A. Lozano, X. -G. Lu, S. Manly, W. A. Mann, K. S. McFarland, O. Moreno, J. K. Nelson, V. Paolone, G. N. Perdue, C. Pernas, M. A. Ramírez, N. Roy, D. Ruterbories, H. Schellman, C. J. Solano Salinas, D. S. Correia, M. Sultana, N. H. Vaughan, A. V. Waldron, B. Yaeggy, L. Zazueta

TL;DR

This work presents a generic multidimensional reweighting framework for $O(1)$ GeV neutrino MC using a Boosted Decision Tree Reweighter to map a source generator’s final-state distributions onto a target model’s expectations, thereby enabling efficient reuse of legacy simulations. By organizing events into controllable topology-based categories and training per-category reweighters on detector-relevant observables, the method achieves improved agreement with the target across high-dimensional spaces and derived quantities such as TKIs and efficiencies. The approach is validated by reweighting GENIE v2.12.6 to GENIE v3.04 AR23 for MINERvA’s CCQE-like $ u_$-carbon sample, demonstrating reduced KS distances and more accurate efficiency predictions, with clear guidance for generalizing to other channels and generators. This technique offers practical benefits by avoiding full MC regeneration and facilitating systematic studies using legacy data while underscoring the need to propagate target-model uncertainties through reweighted predictions.

Abstract

This paper illustrates a generic method for multi-dimensional reweighting of $O(1)$ GeV neutrino interaction Monte Carlo samples. The reweighting is based on a Boosted Decision Tree algorithm trained on high-dimensional space in detector final state observables. This enables one generator's events to be reweighted so that its reconstructed particle content and kinematics distributions, as well as detector efficiency, match those of a target model. The approach establishes an efficient way to reuse legacy Monte Carlo data, avoiding re-generation. As an example, we test its use in a measurement of transverse kinematic imbalance of the $μ^-$ and proton in charged-current quasielastic like $ν_μ$ events from the MINERvA experiment.

Boosted decision tree reweighting of simulated neutrino interactions for $O(1)$ GeV neutrino cross-section measurements

TL;DR

This work presents a generic multidimensional reweighting framework for GeV neutrino MC using a Boosted Decision Tree Reweighter to map a source generator’s final-state distributions onto a target model’s expectations, thereby enabling efficient reuse of legacy simulations. By organizing events into controllable topology-based categories and training per-category reweighters on detector-relevant observables, the method achieves improved agreement with the target across high-dimensional spaces and derived quantities such as TKIs and efficiencies. The approach is validated by reweighting GENIE v2.12.6 to GENIE v3.04 AR23 for MINERvA’s CCQE-like -carbon sample, demonstrating reduced KS distances and more accurate efficiency predictions, with clear guidance for generalizing to other channels and generators. This technique offers practical benefits by avoiding full MC regeneration and facilitating systematic studies using legacy data while underscoring the need to propagate target-model uncertainties through reweighted predictions.

Abstract

This paper illustrates a generic method for multi-dimensional reweighting of GeV neutrino interaction Monte Carlo samples. The reweighting is based on a Boosted Decision Tree algorithm trained on high-dimensional space in detector final state observables. This enables one generator's events to be reweighted so that its reconstructed particle content and kinematics distributions, as well as detector efficiency, match those of a target model. The approach establishes an efficient way to reuse legacy Monte Carlo data, avoiding re-generation. As an example, we test its use in a measurement of transverse kinematic imbalance of the and proton in charged-current quasielastic like events from the MINERvA experiment.

Paper Structure

This paper contains 12 sections, 9 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Example of a decision tree splitting 100 source events and 100 target events into different kinematic regions based on boolean conditions on parameters $p_z^\mu,p_y^p,$ and $T_p$.
  • Figure 2: Schematic illustration of the single-transverse kinematic imbalance — $\delta \phi_T$, $\delta p_T$, and $\delta \alpha_T$ — defined in the plane transverse to the neutrino direction (figure derived from Figure 2 of reference Lu:2015tcr and Figure 2 of reference Cai:2020PRD). The neutrino direction and $\hat{z}$-axis is out of the page while the transverse $\hat{x}$$\hat{y}$ is in the plane of the page. If the hadron momenta $\vec{p}_N$ (possibly a single particle) is observed, the TKI variables account for the differences between its transverse component and the true transverse momentum transfer $\vec{q}_T$.
  • Figure 3: Categorical histogram of MINERvA ME CCQE-like $\nu_\mu$-carbon cross-section contributed from the 7 categories listed in Table \ref{['tab:categories-variables']}. Orange: GENIE v2.12.6. Blue: GENIE v3.04.00 AR23_20i_00_000.
  • Figure 4: Differential cross-sections of categories "1p0n", "1pNn", "2pNn", "2pNn", and "others" combined are plotted with respect to leading proton $p_x, p_y,p_z$ (a, b, c); calorimetric energy $\sum T_p$ (d); $\mu^-\ p_y,p_z$ (e, f); TKI variables $\delta p_t,\delta \alpha_T,\delta \phi_T$ (g, h, i); and leading proton $T_p,\theta$ (j, k). A frequency histogram of weights (l) is also shown. Error bars (visible only in the ratios) are statistical only. Green: test sample GENIE v2.12.6 (v2). Blue: reweighted test sample (v2$'$). Red: target sample GENIE v3.04.00 AR23_20i_00_000 (v3). Cross-section ratios of v2 and v2$'$ comparing to v3 are plotted under each histogram, in yellow and purple respectively. K-S test statistic $D_{\text{KS}}$ before (v2 comparing to v3) and after (v2$'$ comparing to v3) reweighting is printed on each histogram.
  • Figure 5: Differential cross-sections of all categories combined are plotted with respect to calorimetric momenta $\sum p_x, \sum p_y, \sum p_z,$ and energy $\sum T_p$ summed over all final state protons (a, b, c, d); and $\mu^-$$p_y,p_z$ (e, f). A frequency histogram of weights (g) is also shown. Error bars (visible only in the ratios) are statistical only. Green: test sample GENIE v2.12.6 (v2). Blue: reweighted test sample (v2$'$). Red: target sample GENIE v3.04.00 AR23_20i_00_000 (v3). Cross-section ratios of v2 and v2$'$ comparing to v3 are plotted under each histogram, in yellow and purple respectively. K-S test statistic $D_{\text{KS}}$ before (v2 comparing to v3) and after (v2$'$ comparing to v3) reweighting is printed on each histogram.
  • ...and 10 more figures