Table of Contents
Fetching ...

Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

Shaifalee Saxena, Alan Williams, Rafael Fierro, Alexander Scheinker

TL;DR

The paper addresses robustness gaps in deep reinforcement learning for nonlinear time-varying control by integrating bounded extremum seeking (ES) with a DRL policy. The ES layer provides model-free robustness to unknown and drifting control directions, while DRL offers rapid, data-driven control learned from historical trajectories; a safety supervisor blends the two and a warm-starts ES from the DRL action to reduce transients. The authors demonstrate the hybrid ES–DRL controller on general time-varying dynamics and a particle accelerator-inspired KV envelope model for the Los Alamos LANSCE LEBT, showing superior performance over DRL or ES alone, especially under distribution shifts and parameter drift. This approach promises safer, more reliable deployment of learning-based controllers in high-dimensional, safety-critical domains such as accelerators, with planned hardware-in-the-loop tests and theoretical convergence analysis as future work.

Abstract

In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.

Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

TL;DR

The paper addresses robustness gaps in deep reinforcement learning for nonlinear time-varying control by integrating bounded extremum seeking (ES) with a DRL policy. The ES layer provides model-free robustness to unknown and drifting control directions, while DRL offers rapid, data-driven control learned from historical trajectories; a safety supervisor blends the two and a warm-starts ES from the DRL action to reduce transients. The authors demonstrate the hybrid ES–DRL controller on general time-varying dynamics and a particle accelerator-inspired KV envelope model for the Los Alamos LANSCE LEBT, showing superior performance over DRL or ES alone, especially under distribution shifts and parameter drift. This approach promises safer, more reliable deployment of learning-based controllers in high-dimensional, safety-critical domains such as accelerators, with planned hardware-in-the-loop tests and theoretical convergence analysis as future work.

Abstract

In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.

Paper Structure

This paper contains 13 sections, 35 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Maximizing $V(x)=\exp(-x^{2})$ under a sinusoidally varying control direction $b(t)=b_0\cos(2\pi f t)$. (a) Low $f$: DRL reaches high $V$ temporarily, but diverges during large swings of $b(t)$; ES reaches and maintains high $V$. (b) High $f$: DRL diverges; ES maintains high $V$ after convergence.
  • Figure 2: Solutions of the KV equations based on the 22 quadrupole magnet strengths $(G(z, t_0))$ at initial time $t_0$.
  • Figure 3: Architecture of the ES--DRL controller for accelerator tuning. A supervisor selects $\beta$ based on safety constraints and combines $u = \beta(o(t))u_{\mathrm{RL}} + (1-\beta(o(t)))u_{\mathrm{ES}}$. ES may be warm-started from DRL (dotted). The critic is not used during evaluation.
  • Figure 4: Perturbations and performance: (top) injected sinusoidal perturbations at $Q_1$ and $Q_{10}$ and the drift in segment length between $Q_{9}$ and $Q_{10}$ over 500 steps. (bottom) Resulting reward trajectories; the hybrid ES--DRL controller achieves the best overall reward.