Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e
Chenwei Xu, Jerry Yao-Chieh Hu, Aakaash Narayanan, Mattson Thieme, Vladimir Nagaslaev, Mark Austin, Jeremy Arnold, Jose Berlioz, Pierrick Hanlet, Aisha Ibrahim, Dennis Nicklaus, Jovan Mitrevski, Jason Michael St. John, Gauri Pradhan, Andrea Saewert, Kiyomi Seiya, Brian Schupbach, Randy Thurman-Keup, Nhan Tran, Rui Shi, Seda Ogrenci, Alexis Maya-Isabelle Shuping, Kyle Hazelwood, Han Liu
TL;DR
This work tackles the challenge of achieving uniform proton spill intensity for Mu2e by modeling the accelerator as a Markov Decision Process and applying Proximal Policy Optimization (PPO) to a policy that combines a neuralized PID bias with learnable actions. The approach uses an EMA-based reward and a differentiable Mu2e simulator to train a controller whose action at time t is the sum of a PID term and a learned correction, $a_t = \pi(s_t) = \pi^{\text{PID}}(o_t) + \pi^{\text{action}}(a_{t-1})$, enabling real-time spill regulation. Empirical results show a 13.6% improvement in Spill Duty Factor (SDF) over unregulated spills and a 1.6% improvement over a PID baseline across multiple random seeds, indicating that integrating model-based inductive bias with RL can enhance accelerator control. The findings advance automated, real-time proton beam intensity control for Mu2e, with practical implications for achieving stable beam delivery and reducing background in precision physics experiments.
Abstract
We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an automated controller capable of providing real-time feedback and calibration of the Spill Regulation System (SRS) parameters on a millisecond timescale. We treat the Mu2e accelerator system as a Markov Decision Process suitable for Reinforcement Learning (RL), utilizing PPO to reduce bias and enhance training stability. A key innovation in our approach is the integration of a neuralized Proportional-Integral-Derivative (PID) controller into the policy function, resulting in a significant improvement in the Spill Duty Factor (SDF) by 13.6%, surpassing the performance of the current PID controller baseline by an additional 1.6%. This paper presents the preliminary offline results based on a differentiable simulator of the Mu2e accelerator. It paves the groundwork for real-time implementations and applications, representing a crucial step towards automated proton beam intensity control for the Mu2e experiment.
