Table of Contents
Fetching ...

Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training

Marta Babicz, Saúl Alonso-Monsalve, Alain Fauquex, Laura Baudis

TL;DR

Transformer-based models that operate directly on digitised waveforms that outperform GBDT across all PSD targets are benchmarked, with the largest gains on the most challenging labels and on the combined PSD-pass definition.

Abstract

Pulse-shape discrimination (PSD) in high-purity germanium (HPGe) detectors is central to rare-event searches such as neutrinoless double-beta decay (0vBB), yet conventional approaches compress each waveform into a small set of summary parameters, potentially discarding information in the full time series that is relevant for classification. We benchmark transformer-based models that operate directly on digitised waveforms using the Majorana Demonstrator AI/ML data release. Models are trained to reproduce the collaboration-provided accept/reject labels for four standard PSD cuts and to regress calibrated energy. We compare supervised training from scratch, masked autoencoder (MAE) self-supervised pre-training followed by fine-tuning, and a feature-based gradient-boosted decision tree (GBDT) baseline. Transformers outperform GBDT across all PSD targets, with the largest gains on the most challenging labels and on the combined PSD-pass definition. MAE pre-training improves sample efficiency, reducing labelled-data requirements by factors of 2-4 in low-label regimes. For energy regression, both transformer variants show a small common underestimation on the test split, while fine-tuning modestly narrows the residual distribution. These results motivate follow-up studies of robustness across detectors and operating conditions and of performance near QBB.

Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training

TL;DR

Transformer-based models that operate directly on digitised waveforms that outperform GBDT across all PSD targets are benchmarked, with the largest gains on the most challenging labels and on the combined PSD-pass definition.

Abstract

Pulse-shape discrimination (PSD) in high-purity germanium (HPGe) detectors is central to rare-event searches such as neutrinoless double-beta decay (0vBB), yet conventional approaches compress each waveform into a small set of summary parameters, potentially discarding information in the full time series that is relevant for classification. We benchmark transformer-based models that operate directly on digitised waveforms using the Majorana Demonstrator AI/ML data release. Models are trained to reproduce the collaboration-provided accept/reject labels for four standard PSD cuts and to regress calibrated energy. We compare supervised training from scratch, masked autoencoder (MAE) self-supervised pre-training followed by fine-tuning, and a feature-based gradient-boosted decision tree (GBDT) baseline. Transformers outperform GBDT across all PSD targets, with the largest gains on the most challenging labels and on the combined PSD-pass definition. MAE pre-training improves sample efficiency, reducing labelled-data requirements by factors of 2-4 in low-label regimes. For energy regression, both transformer variants show a small common underestimation on the test split, while fine-tuning modestly narrows the residual distribution. These results motivate follow-up studies of robustness across detectors and operating conditions and of performance near QBB.
Paper Structure (25 sections, 8 equations, 9 figures, 3 tables)

This paper contains 25 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Cross-section of a Majorana Demonstrator--style p-type point-contact (PPC) HPGe detector simulated with SolidStateDetectors.jl (SSD) Abt:2021SSD. The colormap shows the point-contact weighting potential $\phi_w(r,z)$, normalised such that $\phi_w=1$ at the p$^{+}$ point contact (green) and $\phi_w=0$ at the outer n$^{+}$ Li-diffused contact (red); grey contours indicate $\phi_w$ equipotentials. White curves show electric-field lines obtained from the SSD field solution under the applied bias (interpretable as hole drift trajectories, which follow $\vec{E}$ toward the p$^{+}$ contact; electrons drift oppositely toward the n$^{+}$ mantle). The dashed grey segment marks the passivated bottom surface between the contacts. Axes show the cylindrical-symmetry coordinates $r$ and $z$ in mm.
  • Figure 2: Illustrative charge-waveform (blue) and current-estimator (red) traces motivating the AvsE selections. AvsE uses the maximum current-estimator amplitude $A$ relative to energy $E$ to separate SSE-like events from MSE-like $\gamma$ events, which can exhibit multiple current peaks and typically smaller corrected AvsE.
  • Figure 3: Classification performance comparison for four PSD labels: DCR, high AvsE, low AvsE, and LQ. Both AUROC (left) and F1 score (right) metrics are shown for models trained from scratch (crosshatch pattern) and fine-tuned after MAE pre-training (diagonal hatch pattern). The fine-tuned model achieves consistently higher performance across all labels, with the most pronounced improvement observed for the LQ cut.
  • Figure 4: Binary PSD-pass (accepted) versus PSD-fail (rejected) classification performance shown as confusion matrices (values are percentages; the colour scale indicates percentage). An event is labelled PSD-pass only if it passes all four collaboration-provided PSD accept/reject labels simultaneously (low AvsE, high AvsE, DCR, and LQ). Each row of panels compares three models: fine-tuned transformer, transformer trained from scratch, and GBDT baseline. The top matrices are normalised by true label (row-normalised), highlighting per-class recall, while the bottom matrices are normalised by predicted label (column-normalised), highlighting precision.
  • Figure 5: Distribution of the relative residual, defined as $(E_{\mathrm{label}} - E_{\mathrm{pred}})/E_{\mathrm{label}}$, for both training approaches. The top panel shows the histograms for fine-tuned (solid blue) and scratch-trained (dashed magenta) models. Statistical parameters ($\mu$ and $\sigma$) quantify the central tendency and spread of each distribution. The bottom panel displays the difference in counts between the two approaches, highlighting that fine-tuning produces more predictions near zero residual.
  • ...and 4 more figures