A Methodology for Developing Foundational Transformer Models in Collider Physics Analysis
E. Abasov, L. Dudko, E. Iudin, A. Markina, P. Volkov, M. Perfilov, A. Zaborenko
TL;DR
This work addresses the fragmentation of collider analyses across final-state signatures by proposing a universal transformer framework trained on multi-process collider data. The approach combines a unified input space, masked variable reconstruction, and multi-task pre-training across five top-quark multiplicity classes to learn cross-process patterns in Standard Model physics and enable transfer to downstream tasks. Key contributions include demonstrating a transformer architecture capable of preserving process-specific interactions through adaptive attention, showing promising representation learning and robustness for rare 3- and 4-topologies, and illustrating a downstream entropy-based method for DM searches using SM-trained models. The proposed framework aims to bridge disparate collider analyses into a scalable, cross-process methodology with potential to enhance sensitivity to rare processes in collider physics.
Abstract
We present a methodology for training foundational transformer models capable of processing collider data with diverse kinematic signatures. Our universal foundation model is designed for simultaneous analysis of all processes involving from one to four top-quarks production with their corresponding background processes. The approach employs multi-task pre-training on combined datasets of simulated events, enabling the model to capture the full spectrum of interaction physics while extracting universal patterns across different final states prior to task-specific fine-tuning. This unified architecture eliminates the need for separate analysis frameworks for different final signatures and specific tasks. The transformer-based pre-training strategy explicitly preserves unique interaction patterns through adaptive attention mechanisms while establishing cross-process correlations. We plan to demonstrate how this architecture maintains sensitivity to rare high-multiplicity topologies (3t and 4t) without compromising performance on conventional channels ($t\bar{t}$, $tX$, $t\bar{t}H$), effectively bridging the gap between disparate analysis paradigms in collider physics.
