Bumblebee: Foundation Model for Particle Physics Discovery
Andrew J. Wildridge, Jack P. Rodgers, Ethan M. Colbert, Yao yao, Andreas W. Jung, Miaoyuan Liu
TL;DR
The paper addresses permutation-invariant particle-physics event data by removing positional encodings and embedding 4-vectors, enabling joint modeling of generator and reconstruction information. It introduces Bumblebee, a BERT-inspired transformer with a Cloze-like pre-training objective, pre-trained on dileptonic ttbar events. Key findings include a 10-20% improvement in ttbar mass reconstruction, AUROC 0.877 for toponium discrimination, and AUROC 0.625 for initial-state classification, demonstrating strong new-effect discovery potential. The approach supports broad applicability to diverse collider processes and future physics discoveries.
Abstract
Bumblebee is a foundation model for particle physics discovery, inspired by BERT. By removing positional encodings and embedding particle 4-vectors, Bumblebee captures both generator- and reconstruction-level information while ensuring sequence-order invariance. Pre-trained on a masked task, it improves dileptonic top quark reconstruction resolution by 10-20% and excels in downstream tasks, including toponium discrimination (AUROC 0.877) and initial state classification (AUROC 0.625). The flexibility of Bumblebee makes it suitable for a wide range of particle physics applications, especially the discovery of new particles.
