Table of Contents
Fetching ...

Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing

Derek Anderson, Amit Bashyal, Markus Diefenthaler, Cristiano Fanelli, Wen Guan, Tanja Horn, Alex Jentsch Meifeng Lin, Tadashi Maeno, Kei Nagai, Hemalata Nayak, Connor Pecar, Karthik Suresh, Fang-Ying Tsai, Anselm Vossen, Tianle Wang, Torre Wenaus

Abstract

The Production and Distributed Analysis (PanDA) system, originally developed for the ATLAS experiment at the CERN Large Hadron Collider (LHC), has evolved into a robust platform for orchestrating large-scale workflows across distributed computing resources. Coupled with its intelligent Distributed Dispatch and Scheduling (iDDS) component, PanDA supports AI/ML-driven workflows through a scalable and flexible workflow engine. We present an AI-assisted framework for detector design optimization that integrates multi-objective Bayesian optimization with the PanDA--iDDS workflow engine to coordinate iterative simulations across heterogeneous resources. The framework addresses the challenge of exploring high-dimensional parameter spaces inherent in modern detector design. We demonstrate the framework using benchmark problems and realistic studies of the ePIC and dRICH detectors for the Electron-Ion Collider (EIC). Results show improved automation, scalability, and efficiency in multi-objective optimization. This work establishes a flexible and extensible paradigm for AI-driven detector design and other computationally intensive scientific applications.

Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing

Abstract

The Production and Distributed Analysis (PanDA) system, originally developed for the ATLAS experiment at the CERN Large Hadron Collider (LHC), has evolved into a robust platform for orchestrating large-scale workflows across distributed computing resources. Coupled with its intelligent Distributed Dispatch and Scheduling (iDDS) component, PanDA supports AI/ML-driven workflows through a scalable and flexible workflow engine. We present an AI-assisted framework for detector design optimization that integrates multi-objective Bayesian optimization with the PanDA--iDDS workflow engine to coordinate iterative simulations across heterogeneous resources. The framework addresses the challenge of exploring high-dimensional parameter spaces inherent in modern detector design. We demonstrate the framework using benchmark problems and realistic studies of the ePIC and dRICH detectors for the Electron-Ion Collider (EIC). Results show improved automation, scalability, and efficiency in multi-objective optimization. This work establishes a flexible and extensible paradigm for AI-driven detector design and other computationally intensive scientific applications.

Paper Structure

This paper contains 10 sections, 6 figures.

Figures (6)

  • Figure 1: An integrated workflow with PanDA and iDDS, where iDDS automates complex and dynamic workflows, and PanDA schedules workloads to large-scale distributed heterogeneous computing resources.
  • Figure 2: AI integration with PanDA/iDDS. iDDS maps AI pipeline functions to remote tasks executed by PanDA across distributed resources and asynchronously aggregates results, enabling a workflow that behaves like local function execution.
  • Figure 3: AID(2)E workflow. (a) The AI-driven framework proposes detector design parameters for multiple objectives, which are evaluated through simulation. (b) With the Function-as-a-Task paradigm, local functions are transformed into PanDA jobs and executed on distributed resources, enabling scalable and transparent workflow execution.
  • Figure 4: High-level workflow of AID(2)E. Closure test 1 validates AI-assisted optimization using benchmark problems with known Pareto fronts. Closure test 2 evaluates distributed execution across heterogeneous resources using PanDA/iDDS. In the full integration stage, benchmark functions are replaced with compute-intensive ePIC simulation and reconstruction tasks.
  • Figure 5: DTLZ2 benchmark results for 5 objectives and 100 parameters. Distributed execution achieves similar convergence while enabling higher concurrency. Runtime is dominated by optimization overhead rather than objective evaluation.
  • ...and 1 more figures