Learning Efficient Representations of Neutrino Telescope Events
Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles
TL;DR
Neutrino telescope data are large, high-dimensional, and sparsely structured due to photon arrival time distributions (PATDs). The paper introduces om2vec, a transformer-based variational autoencoder that maps per-OM PATDs into compact latent representations, enabling efficient reconstruction and downstream analyses such as angular reconstruction. Results show that latent representations retain essential information and achieve comparable performance to full-timing inputs while delivering substantial speedups and enabling image-like ML approaches on latent data. The approach is validated on Prometheus-simulated IceCube-like data, with implications for reduced data throughput and real-time analysis; code and datasets are available on GitHub.
Abstract
Neutrino telescopes detect rare interactions of particles produced in some of the most extreme environments in the Universe. This is accomplished by instrumenting a cubic-kilometer scale volume of naturally occurring transparent medium with light sensors. Given their substantial size and the high frequency of background interactions, these telescopes amass an enormous quantity of large variance, high-dimensional data. These attributes create substantial challenges for analyzing and reconstructing interactions, particularly when utilizing machine learning (ML) techniques. In this paper, we present a novel approach, called om2vec, that employs transformer-based variational autoencoders to efficiently represent the detected photon arrival time distributions of neutrino telescope events by learning compact and descriptive latent representations. We demonstrate that these latent representations offer enhanced flexibility and improved computational efficiency, thereby facilitating downstream tasks in data analysis.
