Table of Contents
Fetching ...

Application of Structured State Space Models to High energy physics with locality-sensitive hashing

Cheng Jiang, Sitian Qian

TL;DR

This work addresses the challenge of processing HL-LHC-scale data, characterized by large point clouds and long sequences, by applying structured state-space models (SSMs) augmented with locality-sensitive hashing (LSH). It investigates pure SSMs and hybrid Transformer-Mamba architectures (Mamba) with OR & AND LSH to achieve near-linear computational complexity while preserving physics performance. The results demonstrate substantial FLOP reductions (over 10×) and superior or comparable tracking and pileup metrics, with notable gains in recall and throughput, signaling a practical path for HL-LHC data analysis. Overall, the approach provides a viable, efficient alternative to full transformer backbones for HEP tasks that exhibit strong local inductive bias, enabling faster inference without sacrificing key physics outcomes.

Abstract

Modern high-energy physics (HEP) experiments are increasingly challenged by the vast size and complexity of their datasets, particularly regarding large-scale point cloud processing and long sequences. In this study, to address these challenges, we explore the application of structured state space models (SSMs), proposing one of the first trials to integrate local-sensitive hashing into either a hybrid or pure Mamba Model. Our results demonstrate that pure SSMs could serve as powerful backbones for HEP problems involving tasks for long sequence data with local inductive bias. By integrating locality-sensitive hashing into Mamba blocks, we achieve significant improvements over traditional backbones in key HEP tasks, surpassing them in inference speed and physics metrics while reducing computational overhead. In key tests, our approach demonstrated promising results, presenting a viable alternative to traditional transformer backbones by significantly reducing FLOPS while maintaining robust performance.

Application of Structured State Space Models to High energy physics with locality-sensitive hashing

TL;DR

This work addresses the challenge of processing HL-LHC-scale data, characterized by large point clouds and long sequences, by applying structured state-space models (SSMs) augmented with locality-sensitive hashing (LSH). It investigates pure SSMs and hybrid Transformer-Mamba architectures (Mamba) with OR & AND LSH to achieve near-linear computational complexity while preserving physics performance. The results demonstrate substantial FLOP reductions (over 10×) and superior or comparable tracking and pileup metrics, with notable gains in recall and throughput, signaling a practical path for HL-LHC data analysis. Overall, the approach provides a viable, efficient alternative to full transformer backbones for HEP tasks that exhibit strong local inductive bias, enabling faster inference without sacrificing key physics outcomes.

Abstract

Modern high-energy physics (HEP) experiments are increasingly challenged by the vast size and complexity of their datasets, particularly regarding large-scale point cloud processing and long sequences. In this study, to address these challenges, we explore the application of structured state space models (SSMs), proposing one of the first trials to integrate local-sensitive hashing into either a hybrid or pure Mamba Model. Our results demonstrate that pure SSMs could serve as powerful backbones for HEP problems involving tasks for long sequence data with local inductive bias. By integrating locality-sensitive hashing into Mamba blocks, we achieve significant improvements over traditional backbones in key HEP tasks, surpassing them in inference speed and physics metrics while reducing computational overhead. In key tests, our approach demonstrated promising results, presenting a viable alternative to traditional transformer backbones by significantly reducing FLOPS while maintaining robust performance.

Paper Structure

This paper contains 20 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An illustration of different structured mask types applied to attention maps, followed by the integration of Local-Sentive Hashing (LSH). The causal mask corresponds to linear attention transformers, the decay mask to the retentive network, and the semi-separable mask represents structured SSM. In this context, LSH operates by aligning similar query-key values, allowing the semi-separable mask to effectively select the relevant blocks from neighboring regions while minimizing interference from unrelated areas.
  • Figure 2: A schematic diagram of two proposed architectures for tasks with local inductive bias is presented. Left: The Mamba-a architecture, inspired by the hybrid Transformer-Mamba model (i.e., Jamba), excludes the MoE layer to reduce training memory requirements. Right: The Mamba-b architecture integrates the OR&AND E2LSH selection and bucketing mechanism into pure Mamba blocks. The final loss for both models is computed from the embedding output using the Info Noise-Contrastive Estimation (InfoNCE) loss infonce, constructed by predefined kNN edge pairs.
  • Figure 3: Performance plots for average inference FLOPs of various small-sized models across different numbers of hits, ranging from 3k to 60k. (All tests were performed on the actual dataset and evaluated on a single NVIDIA A100, ensuring realistic performance evaluation rather than relying on toy points.)
  • Figure 4: Performance plots for average throughputs (millions of hits) of various small-sized models across different numbers of hits, ranging from 3k to 60k. (while the actual FLOPs do not represent fully the raw inference time, can see clearly from the PCT and Flatformer case.)
  • Figure 5: Top-1 accuracy of different-sized models on Tracking-60k scenario. Mamba-a demonstrates better accuracy than all other larger models, while Mamba-b achieves comparable accuracy as the previous SOTA model.
  • ...and 1 more figures