Table of Contents
Fetching ...

A Methodology for Developing Foundational Transformer Models in Collider Physics Analysis

E. Abasov, L. Dudko, E. Iudin, A. Markina, P. Volkov, M. Perfilov, A. Zaborenko

TL;DR

This work addresses the fragmentation of collider analyses across final-state signatures by proposing a universal transformer framework trained on multi-process collider data. The approach combines a unified input space, masked variable reconstruction, and multi-task pre-training across five top-quark multiplicity classes to learn cross-process patterns in Standard Model physics and enable transfer to downstream tasks. Key contributions include demonstrating a transformer architecture capable of preserving process-specific interactions through adaptive attention, showing promising representation learning and robustness for rare 3- and 4-topologies, and illustrating a downstream entropy-based method for DM searches using SM-trained models. The proposed framework aims to bridge disparate collider analyses into a scalable, cross-process methodology with potential to enhance sensitivity to rare processes in collider physics.

Abstract

We present a methodology for training foundational transformer models capable of processing collider data with diverse kinematic signatures. Our universal foundation model is designed for simultaneous analysis of all processes involving from one to four top-quarks production with their corresponding background processes. The approach employs multi-task pre-training on combined datasets of simulated events, enabling the model to capture the full spectrum of interaction physics while extracting universal patterns across different final states prior to task-specific fine-tuning. This unified architecture eliminates the need for separate analysis frameworks for different final signatures and specific tasks. The transformer-based pre-training strategy explicitly preserves unique interaction patterns through adaptive attention mechanisms while establishing cross-process correlations. We plan to demonstrate how this architecture maintains sensitivity to rare high-multiplicity topologies (3t and 4t) without compromising performance on conventional channels ($t\bar{t}$, $tX$, $t\bar{t}H$), effectively bridging the gap between disparate analysis paradigms in collider physics.

A Methodology for Developing Foundational Transformer Models in Collider Physics Analysis

TL;DR

This work addresses the fragmentation of collider analyses across final-state signatures by proposing a universal transformer framework trained on multi-process collider data. The approach combines a unified input space, masked variable reconstruction, and multi-task pre-training across five top-quark multiplicity classes to learn cross-process patterns in Standard Model physics and enable transfer to downstream tasks. Key contributions include demonstrating a transformer architecture capable of preserving process-specific interactions through adaptive attention, showing promising representation learning and robustness for rare 3- and 4-topologies, and illustrating a downstream entropy-based method for DM searches using SM-trained models. The proposed framework aims to bridge disparate collider analyses into a scalable, cross-process methodology with potential to enhance sensitivity to rare processes in collider physics.

Abstract

We present a methodology for training foundational transformer models capable of processing collider data with diverse kinematic signatures. Our universal foundation model is designed for simultaneous analysis of all processes involving from one to four top-quarks production with their corresponding background processes. The approach employs multi-task pre-training on combined datasets of simulated events, enabling the model to capture the full spectrum of interaction physics while extracting universal patterns across different final states prior to task-specific fine-tuning. This unified architecture eliminates the need for separate analysis frameworks for different final signatures and specific tasks. The transformer-based pre-training strategy explicitly preserves unique interaction patterns through adaptive attention mechanisms while establishing cross-process correlations. We plan to demonstrate how this architecture maintains sensitivity to rare high-multiplicity topologies (3t and 4t) without compromising performance on conventional channels (, , ), effectively bridging the gap between disparate analysis paradigms in collider physics.

Paper Structure

This paper contains 6 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Graphical representation of the Transformer model. Input features are embedded through a LinearEmbedding gorishniy2022embeddings layer, then passed through several Attention Blocks employing Multi-Headed Self-Attention. The output of the Transformer is flattened and passed through the final Linear Layer to convert its dimension to $N_{class}$ for classification. This approach can be used to convert the output of the model into any number of dimensions for regression or classification tasks.
  • Figure 2: Example Feynman diagrams for process groups with different top-quark counts.
  • Figure 3: Examples of pre-training results for masked reconstruction (average reconstruction error for the test split and event distributions for some of the reconstructed variables). Masking probability: 30%, standardized (i.e. subtracted the mean and divided by the standard deviation) variables.
  • Figure 4: t-SNE JMLR:v9:vandermaaten08a visualization of the effect of DNN transformation across distance metrics. Each row compares original input space (left) with transformed space (right).
  • Figure 5: One-vs-rest ROC AUC metrics for all classes and reported architectures.
  • ...and 1 more figures