Table of Contents
Fetching ...

Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics

Samuel Young, Kazuhiro Terao

TL;DR

Panda presents a self-distilled, sensor-level pretraining framework for LArTPC data, learning reusable per-point embeddings from unlabeled raw charge clouds via a point-native hierarchical encoder and prototype-based self-distillation. The model supports semantic segmentation and panoptic reconstruction with lightweight heads, achieving dramatic data-efficiency (up to 1,000× fewer labels) and competitive particle identification when frozen or fine-tuned. Key contributions include the prototype-on-hypersphere pretraining objective, a multi-scale sparse encoder, and a Mask2Former–style panoptic head operating directly on raw 3D measurements. This work demonstrates strong, detector-agnostic representations that transfer to multiple reconstruction tasks, reducing calibration and simulation burdens and enabling future multimodal and real-data extensions in high-energy physics detectors.

Abstract

Liquid argon time projection chambers (LArTPCs) provide dense, high-fidelity 3D measurements of particle interactions and underpin current and future neutrino and rare-event experiments. Physics reconstruction typically relies on complex detector-specific pipelines that use tens of hand-engineered pattern recognition algorithms or cascades of task-specific neural networks that require extensive, labeled simulation that requires a careful, time-consuming calibration process. We introduce \textbf{Panda}, a model that learns reusable sensor-level representations directly from raw unlabeled LArTPC data. Panda couples a hierarchical sparse 3D encoder with a multi-view, prototype-based self-distillation objective. On a simulated dataset, Panda substantially improves label efficiency and reconstruction quality, beating the previous state-of-the-art semantic segmentation model with 1,000$\times$ fewer labels. We also show that a single set-prediction head 1/20th the size of the backbone with no physical priors trained on frozen outputs from Panda can result in particle identification that is comparable with state-of-the-art (SOTA) reconstruction tools. Full fine-tuning further improves performance across all tasks.

Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics

TL;DR

Panda presents a self-distilled, sensor-level pretraining framework for LArTPC data, learning reusable per-point embeddings from unlabeled raw charge clouds via a point-native hierarchical encoder and prototype-based self-distillation. The model supports semantic segmentation and panoptic reconstruction with lightweight heads, achieving dramatic data-efficiency (up to 1,000× fewer labels) and competitive particle identification when frozen or fine-tuned. Key contributions include the prototype-on-hypersphere pretraining objective, a multi-scale sparse encoder, and a Mask2Former–style panoptic head operating directly on raw 3D measurements. This work demonstrates strong, detector-agnostic representations that transfer to multiple reconstruction tasks, reducing calibration and simulation burdens and enabling future multimodal and real-data extensions in high-energy physics detectors.

Abstract

Liquid argon time projection chambers (LArTPCs) provide dense, high-fidelity 3D measurements of particle interactions and underpin current and future neutrino and rare-event experiments. Physics reconstruction typically relies on complex detector-specific pipelines that use tens of hand-engineered pattern recognition algorithms or cascades of task-specific neural networks that require extensive, labeled simulation that requires a careful, time-consuming calibration process. We introduce \textbf{Panda}, a model that learns reusable sensor-level representations directly from raw unlabeled LArTPC data. Panda couples a hierarchical sparse 3D encoder with a multi-view, prototype-based self-distillation objective. On a simulated dataset, Panda substantially improves label efficiency and reconstruction quality, beating the previous state-of-the-art semantic segmentation model with 1,000 fewer labels. We also show that a single set-prediction head 1/20th the size of the backbone with no physical priors trained on frozen outputs from Panda can result in particle identification that is comparable with state-of-the-art (SOTA) reconstruction tools. Full fine-tuning further improves performance across all tasks.

Paper Structure

This paper contains 44 sections, 11 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Panda overview. Raw charge depositions corresponding to particle trajectories recorded by a time projection chamber (TPC) (top left) are passed through a point-native hierarchical encoder pre-trained via self-distillation (Panda) to produce a shared embedding (top right). The same pretrained features are used for three downstream tasks with lightweight heads (bottom): semantic segmentation of geometric motifs; particle-level panoptic segmentation with per-particle masks and IDs ($\gamma,e,\mu,\pi,p$); and interaction-level partitioning that groups causally related particles. A single sensor-level backbone supports all tasks without detector-specific heuristics.
  • Figure 2: t-SNE visualization of Panda embeddings. We visualize per-point embeddings from 1,000 images via t-SNE. Clear structure that corresponds to the inter-class diversity and intra-class multi-modality within LArTPC images is exhibited.
  • Figure 3: Panoptic segmentation architecture. We adopt a Mask2Former cheng2021perpixelclassificationneedsemantic-like set prediction module for the panoptic segmentation task. The image is first encoded with the pre-trained Panda backbone. A small MLP removes low energy deposits from the image; the remaining point embeddings are fed into the cross attention mechanism of a small three-layer transformer that decoded learned queries into a set of instance kernels. Masks for each kernel are created by taking the sigmoid of the dot product between the kernel and per-point features. Classification of each kernel into one of five particle types as well as a no-object class is done with a small MLP. The full reconstructed image, with LEDs added back in, is then created using this set of predictions.
  • Figure 4: Self-distillation setup. Global and local/masked views are fed to separate student ($f_\theta$) and teacher ($f_\xi$) encoders that share the Panda backbone. Each network produces a per-point distribution over prototypes. Cross-entropy losses align student predictions on local and masked views with the corresponding unmasked global teacher predictions, enforcing consistent prototype assignments across views. The teacher's weights is updated as an exponential moving average of the student's weights, with $\tau$ scheduled from 0.994 to 1.0 over the length of pretraining.
  • Figure 5: Global, local and masked views used in Panda. Example LArTPC event with two global crops and several local views. Local crops cover contiguous regions of the event. Masked global views hide large subsets of patches from the student while the teacher sees the full event.
  • ...and 10 more figures