Table of Contents
Fetching ...

HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency

Tony Menzo, Alexander Roman, Sergei Gleyzer, Konstantin Matchev, George T. Fleming, Stefan Höche, Stephen Mrenna, Prasanth Shyamsundar

TL;DR

This work addresses the challenge of coordinating complex, multi-stage high-energy physics workflows with agentic large language models. It introduces HEPTAPOD, a framework that integrates LLMs with schema-validated tools, a line-delimited event format (evtjsonl), and run-card–driven orchestration to enable transparent, human-in-the-loop planning and execution. The authors demonstrate a representative BSM leptoquark Monte Carlo validation pipeline—spanning model generation, event generation, showering, and analysis—to show reproducibility, provenance, and robust recovery across stages. The results illustrate how an agent-guided, auditable workflow can coordinate heterogeneous software (FeynRules, MadGraph, Pythia, jet clustering, and resonance reconstruction) while preserving human oversight, with clear paths for future expansion to more domains, automated configuration synthesis, and higher degrees of autonomy.

Abstract

Many workflows in high-energy-physics (HEP) stand to benefit from recent advances in transformer-based large language models (LLMs). While early applications of LLMs focused on text generation and code completion, modern LLMs now support orchestrated agency: the coordinated execution of complex, multi-step tasks through tool use, structured context, and iterative reasoning. We introduce the HEP Toolkit for Agentic Planning, Orchestration, and Deployment (HEPTAPOD), an orchestration framework designed to bring this emerging paradigm to HEP pipelines. The framework enables LLMs to interface with domain-specific tools, construct and manage simulation workflows, and assist in common utility and data analysis tasks through schema-validated operations and run-card-driven configuration. To demonstrate these capabilities, we consider a representative Beyond the Standard Model (BSM) Monte Carlo validation pipeline that spans model generation, event simulation, and downstream analysis within a unified, reproducible workflow. HEPTAPOD provides a structured and auditable layer between human researchers, LLMs, and computational infrastructure, establishing a foundation for transparent, human-in-the-loop systems.

HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency

TL;DR

This work addresses the challenge of coordinating complex, multi-stage high-energy physics workflows with agentic large language models. It introduces HEPTAPOD, a framework that integrates LLMs with schema-validated tools, a line-delimited event format (evtjsonl), and run-card–driven orchestration to enable transparent, human-in-the-loop planning and execution. The authors demonstrate a representative BSM leptoquark Monte Carlo validation pipeline—spanning model generation, event generation, showering, and analysis—to show reproducibility, provenance, and robust recovery across stages. The results illustrate how an agent-guided, auditable workflow can coordinate heterogeneous software (FeynRules, MadGraph, Pythia, jet clustering, and resonance reconstruction) while preserving human oversight, with clear paths for future expansion to more domains, automated configuration synthesis, and higher degrees of autonomy.

Abstract

Many workflows in high-energy-physics (HEP) stand to benefit from recent advances in transformer-based large language models (LLMs). While early applications of LLMs focused on text generation and code completion, modern LLMs now support orchestrated agency: the coordinated execution of complex, multi-step tasks through tool use, structured context, and iterative reasoning. We introduce the HEP Toolkit for Agentic Planning, Orchestration, and Deployment (HEPTAPOD), an orchestration framework designed to bring this emerging paradigm to HEP pipelines. The framework enables LLMs to interface with domain-specific tools, construct and manage simulation workflows, and assist in common utility and data analysis tasks through schema-validated operations and run-card-driven configuration. To demonstrate these capabilities, we consider a representative Beyond the Standard Model (BSM) Monte Carlo validation pipeline that spans model generation, event simulation, and downstream analysis within a unified, reproducible workflow. HEPTAPOD provides a structured and auditable layer between human researchers, LLMs, and computational infrastructure, establishing a foundation for transparent, human-in-the-loop systems.

Paper Structure

This paper contains 42 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: One example of a multi-phase software workflow for a BSM leptoquark study. A model.fr file is implemented in FeynRules and exported as a UFO directory (model_UFO/). The UFO is consumed by MG5_aMC@NLO, which loads the process card model_proc.mg5 and generates parton-level events written to model_events.lhe. These events are then passed to Pythia for showering and hadronization, producing an event record $\mathcal{E}$ containing particle identifiers and four-momenta $\{\,p_x, p_y, p_z, E, \texttt{id}, \ldots\}$.
  • Figure 2: High-level architecture of HEPTAPOD, a HEP-focused agent-orchestration framework built using the Orchestral AI orchestration engine orchestral-ai. The system consists of three interacting components: (i) a database of schema-validated tools that expose domain-specific HEP capabilities via structured inputs and outputs; (ii) an agent-orchestration layer that maintains the evolving context and performs LLM-based reasoning, optionally routing inference across multiple providers; and (iii) a sandboxed tool-execution engine that interfaces with external HEP software and data utilities. Each agent interaction proceeds within an initialized sandboxed workspace, with the agent iteratively reasoning over the evolving context. At each step, the agent determines whether a tool invocation is required; when invoked, tools are executed outside the LLM within the sandbox, and their structured outputs (e.g. JSON artifacts) are injected back into the context before the next reasoning step. This closed-loop reasoning–execution process enables reliable, multi-step HEP workflows while enforcing a strict separation between language-model reasoning and external software execution.
  • Figure 3: Reconstructed minimum leptoquark mass $m^{\text{min}}_{\text{LQ}}$ distributions for three benchmark scalar leptoquark mass points: $m_{S_1} = 1.0, 1.5, 2.0~\text{TeV}$. The agent-orchestrated the full simulation chain (FeynRules, MadGraph, Pythia, jet clustering, and invariant mass reconstruction) for each scan point, producing resonance peaks centered at the corresponding generated masses. Each distribution contains 10,000 simulated events with symmetric two-body decay reconstruction $(\ell j)(\ell j)$ and $\Delta R > 0.4$ separation cuts.