Table of Contents
Fetching ...

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, Cécile Rousseau, George K. Holt, Eszter Székely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats

Abstract

We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

Abstract

We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.
Paper Structure (52 sections, 12 equations, 2 figures, 5 tables)

This paper contains 52 sections, 12 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: TokaMind tokenization and model architecture. Windowed multi-modal inputs $I$ and actuators $A$ are chunked and embedded by signal-specific codecs $E_g$ to produce token embeddings $z_i\in\mathbb{R}^{K_{g(i)}}$ (outputs are embedded at the window level to form targets in the same space). A Token Encoder projects each token to the shared model dimension $d$ and adds learned metadata embeddings (signal, role, modality, relative position). A Transformer Backbone processes the variable-length token set using an attention mask for missing/padded tokens and outputs the [CLS] (classification) token embedding. Modality-specific heads (TS: time series, P: Profile, V: Video) and per-output adapters predict embedded targets $\hat{y}_o\in\mathbb{R}^{K_o}$; a target-availability mask $m_o$ excludes missing outputs from the supervised loss.
  • Figure 2: Relative improvement in group-level test NRMSE over the CNN baseline on TokaMark. Positive values indicate lower NRMSE (better) than the baseline; negative values indicate worse performance.