Table of Contents
Fetching ...

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Antoine Mouchamps, Arthur Malherbe, Adrien Bolland, Damien Ernst

TL;DR

Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning environments for simulating plasma dynamics and control in tokamaks, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics.

Abstract

This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

TL;DR

Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning environments for simulating plasma dynamics and control in tokamaks, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics.

Abstract

This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).

Paper Structure

This paper contains 11 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Update loop in a single TORAX simulation step. Given the plasma state at the current simulation time step and the time series, TORAX solves the system of equations to compute the plasma state at the next time step. Derived quantities, such as performance metrics, are then calculated from this updated state. In the figure, dashed boxes represent variables computed iteratively, while plain boxes represent time series known in advance for the whole simulation.
  • Figure 2: Heatmap of the expected return $J$ over a subset of the full parameter space. To improve the clarity of the figure, the scale used to represent $J$ values is clipped at a minimum of $3.7$.
  • Figure 3: Comparison of one action (total current) trajectory for each policy.
  • Figure 4: Evolution of the current density with respect to the target.