Table of Contents
Fetching ...

HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Jai Bardhan, Radhikesh Agrawal, Abhiram Tilak, Cyrin Neeraj, Subhadip Mitra

TL;DR

A transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider is presented and the model is trained using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture.

Abstract

We present a transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider. We train the model to classify jets using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture. We use the JetClass dataset containing 100M jets of various known particles to pre-train the model with a data-centric approach -- the model uses a fraction of the jet constituents as the context to predict the embeddings of the unseen target constituents. Our pre-trained model fares well with other datasets for standard classification benchmark tasks. We test our model on two additional downstream tasks: top tagging and differentiating light-quark jets from gluon jets. We also evaluate our model with task-specific metrics and baselines and compare it with state-of-the-art models in high-energy physics. Project site: https://hep-jepa.github.io/

HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

TL;DR

A transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider is presented and the model is trained using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture.

Abstract

We present a transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider. We train the model to classify jets using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture. We use the JetClass dataset containing 100M jets of various known particles to pre-train the model with a data-centric approach -- the model uses a fraction of the jet constituents as the context to predict the embeddings of the unseen target constituents. Our pre-trained model fares well with other datasets for standard classification benchmark tasks. We test our model on two additional downstream tasks: top tagging and differentiating light-quark jets from gluon jets. We also evaluate our model with task-specific metrics and baselines and compare it with state-of-the-art models in high-energy physics. Project site: https://hep-jepa.github.io/

Paper Structure

This paper contains 32 sections, 5 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Schematic diagram illustrating the working of the HEP-JEPA model. The model has a structure similar to vision transformers. In the first step, the entire jet is divided into patches using a particle jet tokeniser. These tokens are then masked to form the context and target blocks. Each block is fed into the respective encoder to generate the embeddings. The context embedding, along with the special mask tokens, is used by the predictor to predict the embedding of the masked target blocks.
  • Figure 2: Validation loss vs. training step for the two benchmark models training in a few-shot learning setting for jet classification on the JetClass dataset with $0.5\%$ labels (i.e., $50000$ training samples). One model is trained from scratch, whereas the pre-trained HEP-JEPA model is fine-tuned. The validation loss falls quickly for the HEP-JEPA model --- it achieves the same minimum validation loss as the model trained from scratch three times faster.
  • Figure 3: The t-SNE plot of the pooled embedding obtained for samples within the JetClass dataset.