Table of Contents
Fetching ...

Learning Symmetry-Independent Jet Representations via Jet-Based Joint Embedding Predictive Architecture

Subash Katel, Haoyang Li, Zihan Zhao, Raghav Kansal, Farouk Mokhtar, Javier Duarte

TL;DR

The paper addresses the challenge of training jet-related models when labeled data is scarce or mismatched by introducing J-JEPA, a self-supervised, augmentation-free pretraining framework that predicts target-subjet representations from context-subjet representations using target positions as hints, with the target encoder stabilized via EMA and predictions made in representation space using $L_2$ loss. The approach enables cross-task applicability by removing the need for hand-crafted augmentations tailored to each downstream task, and it demonstrates that pretrained representations outperform randomly initialized baselines for jet tagging, especially under limited labeled data. Key contributions include the physical positional encoding, two embedding strategies for subjets, and a masking scheme inspired by I-JEPA, all validated on JetClass pretraining and Top Tagging finetuning. The findings suggest J-JEPA is a scalable path toward large-scale, cross-task foundation models in jet physics, potentially reducing reliance on labeled simulations and enabling robust transfer to real data.

Abstract

In high energy physics, self-supervised learning (SSL) methods have the potential to aid in the creation of machine learning models without the need for labeled datasets for a variety of tasks, including those related to jets -- narrow sprays of particles produced by quarks and gluons in high energy particle collisions. This study introduces an approach to learning jet representations without hand-crafted augmentations using a jet-based joint embedding predictive architecture (J-JEPA), which aims to predict various physical targets from an informative context. As our method does not require hand-crafted augmentation like other common SSL techniques, J-JEPA avoids introducing biases that could harm downstream tasks. Since different tasks generally require invariance under different augmentations, this training without hand-crafted augmentation enables versatile applications, offering a pathway toward a cross-task foundation model. We finetune the representations learned by J-JEPA for jet tagging and benchmark them against task-specific representations.

Learning Symmetry-Independent Jet Representations via Jet-Based Joint Embedding Predictive Architecture

TL;DR

The paper addresses the challenge of training jet-related models when labeled data is scarce or mismatched by introducing J-JEPA, a self-supervised, augmentation-free pretraining framework that predicts target-subjet representations from context-subjet representations using target positions as hints, with the target encoder stabilized via EMA and predictions made in representation space using loss. The approach enables cross-task applicability by removing the need for hand-crafted augmentations tailored to each downstream task, and it demonstrates that pretrained representations outperform randomly initialized baselines for jet tagging, especially under limited labeled data. Key contributions include the physical positional encoding, two embedding strategies for subjets, and a masking scheme inspired by I-JEPA, all validated on JetClass pretraining and Top Tagging finetuning. The findings suggest J-JEPA is a scalable path toward large-scale, cross-task foundation models in jet physics, potentially reducing reliance on labeled simulations and enabling robust transfer to real data.

Abstract

In high energy physics, self-supervised learning (SSL) methods have the potential to aid in the creation of machine learning models without the need for labeled datasets for a variety of tasks, including those related to jets -- narrow sprays of particles produced by quarks and gluons in high energy particle collisions. This study introduces an approach to learning jet representations without hand-crafted augmentations using a jet-based joint embedding predictive architecture (J-JEPA), which aims to predict various physical targets from an informative context. As our method does not require hand-crafted augmentation like other common SSL techniques, J-JEPA avoids introducing biases that could harm downstream tasks. Since different tasks generally require invariance under different augmentations, this training without hand-crafted augmentation enables versatile applications, offering a pathway toward a cross-task foundation model. We finetune the representations learned by J-JEPA for jet tagging and benchmark them against task-specific representations.

Paper Structure

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The J-JEPA architecture begins by splitting the large-radius jet (large black cone) into target subjets and context subjets. The context encoder and target encoder then separately generate representations for the context subjets and the target subjets. Using the positions of the target subjets as additional information (hints), the predictor takes the context representations and predicts the representations of the target subjets. Finally, the L2 loss function is used to compare the predicted target subjet representations with the encoded target subjet representations, minimizing the difference between them.
  • Figure 2: Comparison of the background rejection metric $1/\varepsilon_B(\varepsilon_S=0.5$) as a function of the number of labeled training samples used for finetuning for a pretrained model versus the one trained from scratch (top), for a pretrained model using Cls Attn versus Flatten for aggregation (bottom left), and for a pretrained model using an attention-based empbedding versus an MLP (bottom right). The shaded bands represent standard deviations calculated from 5 identical trials with random initialization.