Reducing Simulation Dependence in Neutrino Telescopes with Masked Point Transformers
Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles
TL;DR
This work introduces a self-supervised learning pipeline for neutrino telescope reconstruction by shifting most training onto unlabeled real data. The backbone, neptune, combines an Event Tokenizer, a Transformer Encoder, and a downstream head, with masked input pre-training (ratios $0.75$ to $1.0$) and block-expansion finetuning to preserve pretrained representations. Across tasks like directional muon reconstruction and tau-related cascade classification, SSL shows strong robustness to unmodeled noise and domain shifts, while supervised models degrade when simulations miss certain effects. The approach reduces simulation dependence and systematic uncertainties, with code made available for broader adoption in real data analyses and large-scale detectors.
Abstract
Machine learning techniques in neutrino physics have traditionally relied on simulated data, which provides access to ground-truth labels. However, the accuracy of these simulations and the discrepancies between simulated and real data remain significant concerns, particularly for large-scale neutrino telescopes that operate in complex natural media. In recent years, self-supervised learning has emerged as a powerful paradigm for reducing dependence on labeled datasets. Here, we present the first self-supervised training pipeline for neutrino telescopes, leveraging point cloud transformers and masked autoencoders. By shifting the majority of training to real data, this approach minimizes reliance on simulations, thereby mitigating associated systematic uncertainties. This represents a fundamental departure from previous machine learning applications in neutrino telescopes, paving the way for substantial improvements in event reconstruction and classification.
