Table of Contents
Fetching ...

Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning

Allen M. Wang, Oswin So, Charles Dawson, Darren T. Garnier, Cristina Rea, Chuchu Fan

Abstract

The tokamak offers a promising path to fusion energy, but plasma disruptions pose a major economic risk, motivating considerable advances in disruption avoidance. This work develops a reinforcement learning approach to this problem by training a policy to safely ramp-down the plasma current while avoiding limits on a number of quantities correlated with disruptions. The policy training environment is a hybrid physics and machine learning model trained on simulations of the SPARC primary reference discharge (PRD) ramp-down, an upcoming burning plasma scenario which we use as a testbed. To address physics uncertainty and model inaccuracies, the simulation environment is massively parallelized on GPU with randomized physics parameters during policy training. The trained policy is then successfully transferred to a higher fidelity simulator where it successfully ramps down the plasma while avoiding user-specified disruptive limits. We also address the crucial issue of safety criticality by demonstrating that a constraint-conditioned policy can be used as a trajectory design assistant to design a library of feed-forward trajectories to handle different physics conditions and user settings. As a library of trajectories is more interpretable and verifiable offline, we argue such an approach is a promising path for leveraging the capabilities of reinforcement learning in the safety-critical context of burning plasma tokamaks. Finally, we demonstrate how the training environment can be a useful platform for other feed-forward optimization approaches by using an evolutionary algorithm to perform optimization of feed-forward trajectories that are robust to physics uncertainty

Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning

Abstract

The tokamak offers a promising path to fusion energy, but plasma disruptions pose a major economic risk, motivating considerable advances in disruption avoidance. This work develops a reinforcement learning approach to this problem by training a policy to safely ramp-down the plasma current while avoiding limits on a number of quantities correlated with disruptions. The policy training environment is a hybrid physics and machine learning model trained on simulations of the SPARC primary reference discharge (PRD) ramp-down, an upcoming burning plasma scenario which we use as a testbed. To address physics uncertainty and model inaccuracies, the simulation environment is massively parallelized on GPU with randomized physics parameters during policy training. The trained policy is then successfully transferred to a higher fidelity simulator where it successfully ramps down the plasma while avoiding user-specified disruptive limits. We also address the crucial issue of safety criticality by demonstrating that a constraint-conditioned policy can be used as a trajectory design assistant to design a library of feed-forward trajectories to handle different physics conditions and user settings. As a library of trajectories is more interpretable and verifiable offline, we argue such an approach is a promising path for leveraging the capabilities of reinforcement learning in the safety-critical context of burning plasma tokamaks. Finally, we demonstrate how the training environment can be a useful platform for other feed-forward optimization approaches by using an evolutionary algorithm to perform optimization of feed-forward trajectories that are robust to physics uncertainty
Paper Structure (4 sections, 19 equations, 9 figures, 2 tables)

This paper contains 4 sections, 19 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Simplified diagram of the proposed workflow for using a trained control policy as a trajectory design assistant.
  • Figure 2: Constraint-conditioned library of trajectories. Library of trajectories generated under flexible (a) and more stringent (b) constraints. The green shaded region in the $I_p$ plots denotes the target plasma current of below 2MA, while the red shaded regions denote the constraint values. The constraint values conditioned on for each scenario in relation to the minimum and maximum values used for training are visualized as red bars on the right of each plot.
  • Figure 3: Diagram depicting the sim2sim transfer setup.
  • Figure 4: Comparison of RAPTOR simulation results from a naive baseline feed-forward trajectory (red) against the PPO trained policy running in closed loop (blue). The PPO policy clearly yields a considerable reduction in constraint violation. While the nominal SPARC PRD ramp-down has a constant ramp-rate of 1MA/s, our baseline was selected to have the same average ramp-rate as the PPO policy to provide a better comparison between the two cases. Note that while the nominal action space is the rates of change of $P_{aux}$ and $g_s$, the plot shows the time-integrated values for interpertability.
  • Figure 5: (Left) results from parallel PopDownGym instances running the same feed-forward trajectory found via robust optimization. (Right) results from RAPTOR simulation running the feed-forward trajectory found via robust optimization on PopDownGym.
  • ...and 4 more figures