Table of Contents
Fetching ...

Harnessing the Power of Gradient-Based Simulations for Multi-Objective Optimization in Particle Accelerators

Kishansingh Rajput, Malachi Schram, Auralee Edelen, Jonathan Colen, Armen Kasparian, Ryan Roussel, Adam Carpenter, He Zhang, Jay Benesch

TL;DR

This paper demonstrates the power of differentiability for solving MOO problems in particle accelerators using a deep differentiable reinforcement learning (DDRL) algorithm and shows that the DDRL outperforms MFRL, BO, and GA on high dimensional problems.

Abstract

Particle accelerator operation requires simultaneous optimization of multiple objectives. Multi-Objective Optimization (MOO) is particularly challenging due to trade-offs between the objectives. Evolutionary algorithms, such as genetic algorithm (GA), have been leveraged for many optimization problems, however, they do not apply to complex control problems by design. This paper demonstrates the power of differentiability for solving MOO problems using a Deep Differentiable Reinforcement Learning (DDRL) algorithm in particle accelerators. We compare DDRL algorithm with Model Free Reinforcement Learning (MFRL), GA and Bayesian Optimization (BO) for simultaneous optimization of heat load and trip rates in the Continuous Electron Beam Accelerator Facility (CEBAF). The underlying problem enforces strict constraints on both individual states and actions as well as cumulative (global) constraint for energy requirements of the beam. A physics-based surrogate model based on real data is developed. This surrogate model is differentiable and allows back-propagation of gradients. The results are evaluated in the form of a Pareto-front for two objectives. We show that the DDRL outperforms MFRL, BO, and GA on high dimensional problems.

Harnessing the Power of Gradient-Based Simulations for Multi-Objective Optimization in Particle Accelerators

TL;DR

This paper demonstrates the power of differentiability for solving MOO problems in particle accelerators using a deep differentiable reinforcement learning (DDRL) algorithm and shows that the DDRL outperforms MFRL, BO, and GA on high dimensional problems.

Abstract

Particle accelerator operation requires simultaneous optimization of multiple objectives. Multi-Objective Optimization (MOO) is particularly challenging due to trade-offs between the objectives. Evolutionary algorithms, such as genetic algorithm (GA), have been leveraged for many optimization problems, however, they do not apply to complex control problems by design. This paper demonstrates the power of differentiability for solving MOO problems using a Deep Differentiable Reinforcement Learning (DDRL) algorithm in particle accelerators. We compare DDRL algorithm with Model Free Reinforcement Learning (MFRL), GA and Bayesian Optimization (BO) for simultaneous optimization of heat load and trip rates in the Continuous Electron Beam Accelerator Facility (CEBAF). The underlying problem enforces strict constraints on both individual states and actions as well as cumulative (global) constraint for energy requirements of the beam. A physics-based surrogate model based on real data is developed. This surrogate model is differentiable and allows back-propagation of gradients. The results are evaluated in the form of a Pareto-front for two objectives. We show that the DDRL outperforms MFRL, BO, and GA on high dimensional problems.

Paper Structure

This paper contains 14 sections, 5 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Schematic diagram of CEBAF with two anti-parallel North, and South Linacs to accelerate the electrons.
  • Figure 2: Hypervolume calculation in 2D and its normalization using ideal points
  • Figure 3: Comparison of optimal Pareto front produced by MOBO, MOGA, MOTD3 and DDRL algorithms. The best Pareto fronts are chosen for each algorithm from 16 trials. Hypervolumes are displayed in the legend.
  • Figure 4: Time taken to converge; Each scatter dot represent median and the error bars cover 2$\sigma$ confidence bound over 16 trials.
  • Figure 5: Pareto Coverage over time on different environments with the four algorithms. x-axis (log scale) represents time in minutes. While MOBO takes longest to converge, DDRL is the fastest on large scale problem (200-dimensional). See also the corresponding convergence plot in terms of samples/iterations in Figure \ref{['fig:ParetoCoverageStep']}. Note - The time axis is shown in log scale and does not include hypervolume at 0 seconds which is zero for all the algorithms due to random initialization.
  • ...and 5 more figures