OmniLearned: A Foundation Model Framework for All Tasks Involving Jet Physics
Wahid Bhimji, Chris Harris, Vinicius Mikuni, Benjamin Nachman
TL;DR
This work addresses the need for a scalable foundation model in jet physics by introducing OmniLearned, an upgraded PET v2-based framework trained on over one billion jets and paired with a unified data-access software stack. The approach combines classification and generation objectives, via a multi-task loss and a diffusion/flow-matching generative head, to learn rich per-jet representations that transfer across tasks. Key contributions include architectural enhancements (local/global attention with physics-informed biases and multiple task heads), a massive, diverse pretraining dataset with broad labels, and demonstrated state-of-the-art performance on top-quark tagging, b-/c-tagging with ATLAS data, and anomaly detection on CMS open data. The results indicate improved discovery potential across past, current, and future collider experiments, with larger OmniLearned models delivering the strongest gains albeit at higher compute costs, and the framework offering broad applicability beyond jet physics.
Abstract
Foundation models use large datasets to build an effective representation of data that can be deployed on diverse downstream tasks. Previous research developed the OmniLearn foundation model for jet physics, using unique properties of particle physics, and showed that it could significantly advance discovery potential across collider experiments. This paper introduces a major upgrade, resulting in the OmniLearned framework. This framework has three new elements: (1) updates to the model architecture and training, (2) using over one billion jets used for training, and (3) providing well-documented software for accessing all datasets and models. We demonstrate OmniLearned with three representative tasks: top-quark jet tagging with the community Delphes-based benchmark dataset, b-tagging with ATLAS full simulation, and anomaly detection with CMS experimental data. In each case, OmniLearned is the state of the art, further expanding the discovery potential of past, current, and future collider experiments.
