A Living Review Pipeline for AI/ML Applications in Accelerator Physics
Adnan Ghribi
TL;DR
This work delivers an open-source, automated living review pipeline for AI/ML applications in accelerator physics, addressing the rapid pace and cross-disciplinary nature of the field. It harvests publications from multiple bibliographic sources, deduplicates entries, and applies semantic filtering with three reference anchors to retain works at the accelerator–ML intersection, followed by thematic classification and multi-format exports. The resulting corpus for 2000–2025 contains about $N=244$ papers (approximately 2% of raw results), enabling quantitative trend analyses, case studies, and near real-time updates within a FAIR framework. By providing an extensible, reproducible, and community-driven resource, the pipeline supports transparent literature curation, accelerates cross-disciplinary discovery, and fosters responsible adoption of AI/ML in accelerator science.
Abstract
We present an open-source pipeline for generating a \emph{living review} of artificial intelligence (AI) and machine learning (ML) applications in accelerator physics and technologies. Traditional review articles provide static snapshots that are quickly outdated by the rapid pace of research. The presented system automatically harvests publications from multiple bibliographic sources (arXiv, InspireHEP, HAL, OpenAlex, Crossref, and Springer), deduplicates entries, applies semantic filtering to ensure accelerator and ML relevance, and classifies papers into thematic categories. The resulting curated dataset was exported in JSON, HTML, PDF, and Bib\TeX formats, enabling continuous updates and integration with web frameworks. We describe the methodology, including semantic similarity filtering using sentence-transformer embeddings, threshold calibration, and expert-informed classification. The results demonstrate the robust filtering of $\sim$12000 raw papers/month into a focused corpus of $\sim$2\% relevant works. The pipeline provides the basis for an evolving community-driven review of AI/ML in accelerator science.
