Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Matthew L. Olson; Shusen Liu; Jayaraman J. Thiagarajan; Bogdan Kustowski; Weng-Keen Wong; Rushil Anirudh

Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Matthew L. Olson, Shusen Liu, Jayaraman J. Thiagarajan, Bogdan Kustowski, Weng-Keen Wong, Rushil Anirudh

TL;DR

This work tackles bridging the simulation–experiment gap in inertial confinement fusion under extremely limited real data. It introduces a transformer-based surrogate trained on large-scale simulations and refined with masked autoencoding, coupled with a novel graph-based hyper-parameter optimization to combat overfitting and noisy validation signals. The methodology enables effective multi-modal transfer learning (scalars and X-ray images) and demonstrates a substantial reduction in predictive error, notably a ~40% relative improvement over state-of-the-art neural surrogates, with robust performance across real and synthetic benchmarks. The approach offers a scalable path for accurate, data-efficient predictions in physics-guided ML and can be extended to other domains with similar simulation–experiment gaps.

Abstract

Recent advances in machine learning, specifically transformer architecture, have led to significant advancements in commercial domains. These powerful models have demonstrated superior capability to learn complex relationships and often generalize better to new data and problems. This paper presents a novel transformer-powered approach for enhancing prediction accuracy in multi-modal output scenarios, where sparse experimental data is supplemented with simulation data. The proposed approach integrates transformer-based architecture with a novel graph-based hyper-parameter optimization technique. The resulting system not only effectively reduces simulation bias, but also achieves superior prediction accuracy compared to the prior method. We demonstrate the efficacy of our approach on inertial confinement fusion experiments, where only 10 shots of real-world data are available, as well as synthetic versions of these experiments.

Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 5 figures, 4 tables)

This paper contains 23 sections, 10 equations, 5 figures, 4 tables.

Introduction
Main Findings
Experimental Setup and Results
Comparative Statistical Evaluation of Hyper-parameter Selection Strategies
Diagnostic X-ray Images
Robust Hyper-parameter Optimization via Graph Smoothing
Effects of Increased Training Data
Extreme Case: One-Shot Learning
Analysis of feature Embeddings using CKA
Discussion
Methods
Formal Definitions
Problem Setup
Fine-tuning and model adaptation
Masked training with Transformer Surrogates
...and 8 more sections

Figures (5)

Figure 1: Our method is separated into three distinct stages: First, pretraining on simulation data with masked autoencoding and surrogate losses. Second, finetuning our model on the experimental data with a hyper-parameter sweep. Finally, finding the best hyper-parameter settings using our novel graph-based selection.
Figure 2: Our models' predictions on the held-out test X-ray images after fine-tuning on the real training data, compared to the baseline. Pixels here represent energy outputs of the experimental implosion. White pixels are high energy, purple are lower energy, and black are no energy. Zoom in to better see the results. The MSE for our method is lower than the baseline for the test predictions. While the image quality is not perfect, we find that our new model has modest improvements over the baseline both qualitatively and quantitatively.
Figure 3: Hyper-parameter graph smoothing ensures optimal model selection based on noisy validation error. (Left:) Validation versus test error rates for scalar predictions experiment data. (Right): Proposed graph-smoothed validation error vs test error. We highlight the minimum validation error and the minimum smoothed validation error finding that our smoothing removes the noisy data to find a robust, well performing hyper-parameter selection.
Figure 4: Detailed results comparing the masking strategies $L_{masked}$ and $L_{pred}$, as well as using the smoothed graph validation error rate $GSE_{min} \,$ versus the non-graph minimum validation error rate $VE_{min}\,$. We find the interesting result: masking is useful for the $\mathcal{Y}$ dataset, but not for $\mathcal{R}$. Furthermore, using the graph is always an improvement.
Figure 5: Masked Pre-training: Our novel multi-modal architecture leverages both images and scalars as inputs and outputs for a transformer-based deep neural network. Transformers enable straightforward surrogate models as well as effective representation learning through masked autoencoding.

Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

TL;DR

Abstract

Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Authors

TL;DR

Abstract

Table of Contents

Figures (5)