Table of Contents
Fetching ...

Full Shot Predictions for the DIII-D Tokamak via Deep Recurrent Networks

Ian Char, Youngseog Chung, Joseph Abbate, Egemen Kolemen, Jeff Schneider

TL;DR

This work tackles the problem of forecasting the full time evolution of tokamak plasmas by learning a data-driven dynamics model from 7,884 DIII-D shots. It uses a encoder–GRU–decoder architecture that predicts state changes at 25 ms intervals, with dual Gaussian heads to yield mean and variance via a negative log-likelihood objective, enabling predictive uncertainty quantification. Through extensive ablations, the authors demonstrate the benefits of ensemble-based distributional predictions, compare GRU, LSTM, and MLP recurrent units, and show that distributional outputs improve long-horizon accuracy and calibration relative to point predictions. The results show calibrated, long-horizon forecasts across multiple plasma diagnostics, highlighting potential for data-driven control and actuator optimization in fusion devices.

Abstract

Although tokamaks are one of the most promising devices for realizing nuclear fusion as an energy source, there are still key obstacles when it comes to understanding the dynamics of the plasma and controlling it. As such, it is crucial that high quality models are developed to assist in overcoming these obstacles. In this work, we take an entirely data driven approach to learn such a model. In particular, we use historical data from the DIII-D tokamak to train a deep recurrent network that is able to predict the full time evolution of plasma discharges (or "shots"). Following this, we investigate how different training and inference procedures affect the quality and calibration of the shot predictions.

Full Shot Predictions for the DIII-D Tokamak via Deep Recurrent Networks

TL;DR

This work tackles the problem of forecasting the full time evolution of tokamak plasmas by learning a data-driven dynamics model from 7,884 DIII-D shots. It uses a encoder–GRU–decoder architecture that predicts state changes at 25 ms intervals, with dual Gaussian heads to yield mean and variance via a negative log-likelihood objective, enabling predictive uncertainty quantification. Through extensive ablations, the authors demonstrate the benefits of ensemble-based distributional predictions, compare GRU, LSTM, and MLP recurrent units, and show that distributional outputs improve long-horizon accuracy and calibration relative to point predictions. The results show calibrated, long-horizon forecasts across multiple plasma diagnostics, highlighting potential for data-driven control and actuator optimization in fusion devices.

Abstract

Although tokamaks are one of the most promising devices for realizing nuclear fusion as an energy source, there are still key obstacles when it comes to understanding the dynamics of the plasma and controlling it. As such, it is crucial that high quality models are developed to assist in overcoming these obstacles. In this work, we take an entirely data driven approach to learn such a model. In particular, we use historical data from the DIII-D tokamak to train a deep recurrent network that is able to predict the full time evolution of plasma discharges (or "shots"). Following this, we investigate how different training and inference procedures affect the quality and calibration of the shot predictions.
Paper Structure (14 sections, 3 equations, 5 figures, 2 tables)

This paper contains 14 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Architecture for the Recurrent Model. The encoder is a single layer MLP which embeds the states, actuators, and next actuators into a 512 dimensional space. This is fed to the GRU unit which outputs a 128 dimensional embedding which is concatenated with the original embedding before being fed to the decoder. The double headed outputs are single linear layers outputting the mean and log variance of a Gaussian. Note the pluses with circles denote a residual connection.
  • Figure 2: Replay of a Test Set Shot The replay was generated with an ensemble of models which sample from their respective Gaussian distribution at each step. While the model has access to the true actuators throughout the entire shot, it only takes in the first true state and autoregressively predicts the rest. The faded blue lines show one sampled trajectory, while the darker blue line shows the average over the trajectories. The black lines show the true values for the experiment. The top row shows the reconstructed profiles at the last time step. Here, the x-axis is over the minor radius of the tokamak, where 0 is the closest to the magnetic axis and 33 is closest to the wall. The other plots show the scalar values over time. The x-axis shows the time into the shot in ms.
  • Figure 3: Explained Variance per Time Step. Each of the colored lines show a different way of generating trajectories with the same models. The blue lines simply take the mean of the Gaussian distribution while the red line samples from the Gaussian distribution at every step. Each curve shows the mean over four different models with different random seeds. The shaded area shows the standard error. In the bottom row, we show the EV for the first principle component of the corresponding profiles.
  • Figure 4: Uncertainty Metrics over Time. The leftmost plot shows coverage of the 90% prediction interval. Models with good predictive uncertainties should therefore match this 90% (shown as dotted black line). For the miscalibration area plot, the lower the score, the better calibrated the model is. Each of the metrics is averaged over all output dimensions. Moreover the curves show the mean over four models, and the shaded region shows the standard error.
  • Figure 5: Explained Variance Averaged over All Output Dimensions. Each curve was generated by taking the average over four different trained models. The shaded area shows the standard error. All curves were generated by taking the mean output of the predicted Gaussian distributions (where applicable).