Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language

Jan Kaiser; Annika Eichler; Anne Lauscher

Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language

Jan Kaiser, Annika Eichler, Anne Lauscher

TL;DR

The paper investigates whether large language models (LLMs) can autonomously tune a particle accelerator subsystem via natural-language prompts. It evaluates four prompting schemes (Tuning, Explained, Chain-of-Thought, Optimisation) across 14 LLMs and benchmarks them against state-of-the-art optimization methods BO and RLO using a DESY ARES EA transverse-beam tuning task. Results show that while some LLM/prompt combinations can achieve improvements, they generally underperform compared with BO/RLO in both final accuracy and convergence speed, and incur higher computational and environmental costs. The study demonstrates a proof-of-concept for NL-driven tuning and highlights potential future roles for LLMs as copilots or coordinators in accelerator operations, rather than as replacements for existing optimizers.

Abstract

Autonomous tuning of particle accelerators is an active and challenging field of research with the goal of enabling novel accelerator technologies cutting-edge high-impact applications, such as physics discovery, cancer research and material sciences. A key challenge with autonomous accelerator tuning remains that the most capable algorithms require an expert in optimisation, machine learning or a similar field to implement the algorithm for every new tuning task. In this work, we propose the use of large language models (LLMs) to tune particle accelerators. We demonstrate on a proof-of-principle example the ability of LLMs to successfully and autonomously tune a particle accelerator subsystem based on nothing more than a natural language prompt from the operator, and compare the performance of our LLM-based solution to state-of-the-art optimisation algorithms, such as Bayesian optimisation (BO) and reinforcement learning-trained optimisation (RLO). In doing so, we also show how LLMs can perform numerical optimisation of a highly non-linear real-world objective function. Ultimately, this work represents yet another complex task that LLMs are capable of solving and promises to help accelerate the deployment of autonomous tuning algorithms to the day-to-day operations of particle accelerators.

Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language

TL;DR

Abstract

Paper Structure (14 sections, 1 equation, 5 figures)

This paper contains 14 sections, 1 equation, 5 figures.

Introduction
Related Work
Tuning Particle Accelerators Through Natural Language
Optimisation Scheme
Tuning Prompt
Explained Prompt
Chain-of-Thought Prompt
Optimisation Prompt
Evaluation
Method
Results
Conclusion and Outlook
System Prompts
Failed Responses

Figures (5)

Figure 1: Schematic of the EA section of the ARES linear particle accelerator. Quadrupole magnets are shown in red; the vertical and horizontal dipole are shown in blue and turquoise, respectively. The electron beam is shown as a green envelop passing through the magnets and onto the screen at the end of the experimental area.
Figure 2: Flowchart of the optimisation scheme used to tune particle accelerators using LLM. The prompt is made up for three components: Task description, list of previous input and output samples, and instructions what to output and how to format the output. The prompt is then sent to the LLM, which generates a response. The response is parsed into values that can be input into the tuning / optimisation task. A measurement or objective value from the task is then inserted into the previous samples along with its corresponding input and the loop is repeated.
Figure 3: Number of successful runs for each model and prompt (a) and the number wholly successful trials, i.e. trials where all three runs were successful (b). We define as success an improvement of at least 40 on the beam differences when compared to the initial magnet settings.
Figure 4: Magnet setting and beam parameter traces for a good and a bad tuning run by LLM. Both runs used the same trial, where the target beam parameters are $\mu_x = \mu_y = \sigma_x = \sigma_y = \qty{0}{\milli\meter}$.
Figure 5: Number of successful tuning runs, average normalised MAE improvement and average normalised accumulated MAE for each LLM model with respect to its size, LMSYS Chatbot Arena ELO rating, MT-bench score, MMLU score and HellaSwag score. Results for the Explained Prompt are shown in black and results for the Optimisation Prompt are shown in blue. Linear fits are shown for the presented data. We expect the number of successful episodes to increase and the other two metrics to decrease, if model size or high benchmark scores improve the ability of LLM to solve the investigated particle accelerator tuning task.

Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language

TL;DR

Abstract

Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language

Authors

TL;DR

Abstract

Table of Contents

Figures (5)