Physics Event Classification Using Large Language Models

Cristiano Fanelli; James Giroux; Patrick Moran; Hemalata Nayak; Karthik Suresh; Eric Walter

Physics Event Classification Using Large Language Models

Cristiano Fanelli, James Giroux, Patrick Moran, Hemalata Nayak, Karthik Suresh, Eric Walter

TL;DR

The paper evaluates whether a Large Language Model (ChatGPT-3.5) can drive a physics-focused ML task under tight access constraints by organizing an 8-hour hackathon using a Streamlit web interface and AWS GPU compute to classify GlueX BCAL showers as neutrons or photons with 14 features across two phase-space regimes. The approach demonstrated near-perfect accuracy across teams, with one team achieving top performance while minimizing prompt usage, illustrating the viability of LLM-assisted, domain-specific ML workflows in experimental physics. It also documents infrastructure, scoring, and data-collection practices to support future prompt-engineering research and education within the AI4EIC program. Overall, the work highlights the practical potential and actionable pathways for integrating LLMs into physics data analysis and ML toolchains, while outlining rich opportunities for systematic studies of prompt strategies in domain tasks.

Abstract

The 2023 AI4EIC hackathon was the culmination of the third annual AI4EIC workshop at The Catholic University of America. This workshop brought together researchers from physics, data science and computer science to discuss the latest developments in Artificial Intelligence (AI) and Machine Learning (ML) for the Electron Ion Collider (EIC), including applications for detectors, accelerators, and experimental control. The hackathon, held on the final day of the workshop, involved using a chatbot powered by a Large Language Model, ChatGPT-3.5, to train a binary classifier neutrons and photons in simulated data from the \textsc{GlueX} Barrel Calorimeter. In total, six teams of up to four participants from all over the world took part in this intense educational and research event. This article highlights the hackathon challenge, the resources and methodology used, and the results and insights gained from analyzing physics data using the most cutting-edge tools in AI/ML.

Physics Event Classification Using Large Language Models

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 4 figures)

This paper contains 17 sections, 1 equation, 4 figures.

Introduction
The Hackathon Problem
The LLM problem:
The Classification Problem: PID with GlueX BCAL
Infrastructure, Resources and Methodology
Datasets.
Time constraints.
Context constraints.
Scoring.
Compute Resources.
Web Application.
Results: Participants Exceeding Expectations
Conclusions
Future studies based on Hackathon:
Definitions of Feature Variables
...and 2 more sections

Figures (4)

Figure 1: Sketch of barrel calorimeter readout: (A) BCAL schematic; (B) a BCAL module side view; (C) and view of the BCAL showing all 48 modules and (D) an end view of a single module showing readout segmentation in four rings (inner to outer) and 16 summed readout zones demarcated by colors. Figure from Fanelli et. al. F+M
Figure 2: The developed software infrastructure for the hackathon. The arrows in the figure denote the flow of control for the application. Interactions between users and ChatGPT are recorded through Trubrics. A sample Chat Session is shown in the appendix in Fig. \ref{['app:example_prompt']}.
Figure 3: (a) The plot on the left, shows the interface, provided for submission of solutions. Users select the question to evaluate and the respective file path in the AWS instance. Once submitted, the solutions are graded and automatically update in the leaderboard. (b) The figure on the right shows the leaderboard at the end of the hackathon summarizing the performance stats of the teams. With LLM, all participants achieved an accuracy of more than $99\%$ for both questions and are within the statistical fluctuations. However, "Jets" used the least number of tokens (8192 tokens in total) for the submitted solutions and were the fastest compared to other participants.
Figure 4: Example GPT Chat Session: In a hackathon, a participant starts a chat session by setting a context, then asks ChatGPT questions. ChatGPT provides code and explanations, which the participant names and pushes to AWS for training. The session ends upon code submission, redirecting the user to a portal. Usage statistics are displayed, indicating when the session will end and a new one will start based on token usage.

Physics Event Classification Using Large Language Models

TL;DR

Abstract

Physics Event Classification Using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)