Table of Contents
Fetching ...

A Living Review Pipeline for AI/ML Applications in Accelerator Physics

Adnan Ghribi

TL;DR

This work delivers an open-source, automated living review pipeline for AI/ML applications in accelerator physics, addressing the rapid pace and cross-disciplinary nature of the field. It harvests publications from multiple bibliographic sources, deduplicates entries, and applies semantic filtering with three reference anchors to retain works at the accelerator–ML intersection, followed by thematic classification and multi-format exports. The resulting corpus for 2000–2025 contains about $N=244$ papers (approximately 2% of raw results), enabling quantitative trend analyses, case studies, and near real-time updates within a FAIR framework. By providing an extensible, reproducible, and community-driven resource, the pipeline supports transparent literature curation, accelerates cross-disciplinary discovery, and fosters responsible adoption of AI/ML in accelerator science.

Abstract

We present an open-source pipeline for generating a \emph{living review} of artificial intelligence (AI) and machine learning (ML) applications in accelerator physics and technologies. Traditional review articles provide static snapshots that are quickly outdated by the rapid pace of research. The presented system automatically harvests publications from multiple bibliographic sources (arXiv, InspireHEP, HAL, OpenAlex, Crossref, and Springer), deduplicates entries, applies semantic filtering to ensure accelerator and ML relevance, and classifies papers into thematic categories. The resulting curated dataset was exported in JSON, HTML, PDF, and Bib\TeX formats, enabling continuous updates and integration with web frameworks. We describe the methodology, including semantic similarity filtering using sentence-transformer embeddings, threshold calibration, and expert-informed classification. The results demonstrate the robust filtering of $\sim$12000 raw papers/month into a focused corpus of $\sim$2\% relevant works. The pipeline provides the basis for an evolving community-driven review of AI/ML in accelerator science.

A Living Review Pipeline for AI/ML Applications in Accelerator Physics

TL;DR

This work delivers an open-source, automated living review pipeline for AI/ML applications in accelerator physics, addressing the rapid pace and cross-disciplinary nature of the field. It harvests publications from multiple bibliographic sources, deduplicates entries, and applies semantic filtering with three reference anchors to retain works at the accelerator–ML intersection, followed by thematic classification and multi-format exports. The resulting corpus for 2000–2025 contains about papers (approximately 2% of raw results), enabling quantitative trend analyses, case studies, and near real-time updates within a FAIR framework. By providing an extensible, reproducible, and community-driven resource, the pipeline supports transparent literature curation, accelerates cross-disciplinary discovery, and fosters responsible adoption of AI/ML in accelerator science.

Abstract

We present an open-source pipeline for generating a \emph{living review} of artificial intelligence (AI) and machine learning (ML) applications in accelerator physics and technologies. Traditional review articles provide static snapshots that are quickly outdated by the rapid pace of research. The presented system automatically harvests publications from multiple bibliographic sources (arXiv, InspireHEP, HAL, OpenAlex, Crossref, and Springer), deduplicates entries, applies semantic filtering to ensure accelerator and ML relevance, and classifies papers into thematic categories. The resulting curated dataset was exported in JSON, HTML, PDF, and Bib\TeX formats, enabling continuous updates and integration with web frameworks. We describe the methodology, including semantic similarity filtering using sentence-transformer embeddings, threshold calibration, and expert-informed classification. The results demonstrate the robust filtering of 12000 raw papers/month into a focused corpus of 2\% relevant works. The pipeline provides the basis for an evolving community-driven review of AI/ML in accelerator science.

Paper Structure

This paper contains 29 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: System architecture of the living_review pipeline. Publications are collected from multiple sources, deduplicated, filtered by semantic relevance (including noise suppression and keyword exclusion), classified, and exported in multiple formats for dissemination.
  • Figure 2: Publication trends analysis showing temporal patterns at two scales: (a) Annual overview from 2000-2025 displaying long-term growth trajectory with significant acceleration from 2021 onwards and peak activity in 2024-2025 (highlighted region); (b) Detailed monthly activity from 2022-2025 revealing short-term fluctuations and sustained productivity with notable peaks, including 3-month moving average trend line for clarity.
  • Figure 3: Thematic distribution: (a) Research categories ranked by publication count ; (b) keyword frequency distribution revealing specific research topics.