Table of Contents
Fetching ...

Physics-Informed Neural Compression of High-Dimensional Plasma Data

Gianluca Galletti, Gerald Gutenbrunner Sandeep S. Cranganore, William Hornsby, Lorenzo Zanisi, Naomi Carey, Stanislas Pamela, Johannes Brandstetter, Fabian Paischer

TL;DR

This work tackles the storage burden of high-fidelity, 5D gyrokinetic turbulence data by introducing Physics-Informed Neural Compression (PINC), which enforces gyrokinetics-specific physics losses during compression. It combines two learned representations—a 5D Swin-transformer autoencoder and neural implicit fields—with gyrokinetics-aware losses that preserve integral quantities like $Q$ and $\bm{\phi}$ and turbulence spectra such as $k_y^{\text{spec}}$, while monitoring temporal fidelity via energy-cascade Wasserstein distance and End-Point Error. The authors present a spatiotemporal evaluation pipeline, conduct extensive ablations, and demonstrate compression ratios exceeding $70{,}000\times$, rising to $120{,}000\times$ with entropy coding, all with markedly better physics fidelity than traditional baselines. The approach enables post-hoc analyses that were previously infeasible due to data size, offering a scalable path to preserving physically meaningful information in large-scale plasma simulations. The work provides practical guidance for physics-informed compression and sets a benchmark for future studies in neural compression of turbulent, high-dimensional scientific data.

Abstract

High-fidelity scientific simulations are now producing unprecedented amounts of data, creating a storage and analysis bottleneck. A single simulation can generate tremendous data volumes, often forcing researchers to discard valuable information. A prime example of this is plasma turbulence described by the gyrokinetic equations: nonlinear, multiscale, and 5D in phase space. It constitutes one of the most computationally demanding frontiers of modern science, with runs taking weeks and yielding tens of terabytes of data dumps.The increasing storage demands underscore the importance of compression. However, reconstructed snapshots do not necessarily preserve essential physical quantities. We present a spatiotemporal evaluation pipeline, accounting for structural phenomena and multi-scale transient fluctuations to assess the degree of physical fidelity. Indeed, we find that various compression techniques lack preservation of both spatial mode structure and temporal turbulence characteristics. Therefore, we explore Physics-Informed Neural Compression (PINC), which incorporates physics-informed losses tailored to gyrokinetics and enables extreme compressions ratios of over 70,000x. Entropy coding on top of PINC further pushes it to 120,000x. This direction provides a viable and scalable solution to the prohibitive storage demands of gyrokinetics, enabling post-hoc analyses that were previously infeasible.

Physics-Informed Neural Compression of High-Dimensional Plasma Data

TL;DR

This work tackles the storage burden of high-fidelity, 5D gyrokinetic turbulence data by introducing Physics-Informed Neural Compression (PINC), which enforces gyrokinetics-specific physics losses during compression. It combines two learned representations—a 5D Swin-transformer autoencoder and neural implicit fields—with gyrokinetics-aware losses that preserve integral quantities like and and turbulence spectra such as , while monitoring temporal fidelity via energy-cascade Wasserstein distance and End-Point Error. The authors present a spatiotemporal evaluation pipeline, conduct extensive ablations, and demonstrate compression ratios exceeding , rising to with entropy coding, all with markedly better physics fidelity than traditional baselines. The approach enables post-hoc analyses that were previously infeasible due to data size, offering a scalable path to preserving physically meaningful information in large-scale plasma simulations. The work provides practical guidance for physics-informed compression and sets a benchmark for future studies in neural compression of turbulent, high-dimensional scientific data.

Abstract

High-fidelity scientific simulations are now producing unprecedented amounts of data, creating a storage and analysis bottleneck. A single simulation can generate tremendous data volumes, often forcing researchers to discard valuable information. A prime example of this is plasma turbulence described by the gyrokinetic equations: nonlinear, multiscale, and 5D in phase space. It constitutes one of the most computationally demanding frontiers of modern science, with runs taking weeks and yielding tens of terabytes of data dumps.The increasing storage demands underscore the importance of compression. However, reconstructed snapshots do not necessarily preserve essential physical quantities. We present a spatiotemporal evaluation pipeline, accounting for structural phenomena and multi-scale transient fluctuations to assess the degree of physical fidelity. Indeed, we find that various compression techniques lack preservation of both spatial mode structure and temporal turbulence characteristics. Therefore, we explore Physics-Informed Neural Compression (PINC), which incorporates physics-informed losses tailored to gyrokinetics and enables extreme compressions ratios of over 70,000x. Entropy coding on top of PINC further pushes it to 120,000x. This direction provides a viable and scalable solution to the prohibitive storage demands of gyrokinetics, enabling post-hoc analyses that were previously infeasible.
Paper Structure (24 sections, 27 equations, 12 figures, 14 tables)

This paper contains 24 sections, 27 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: Sketch of the training and evaluation for PINC models. Training is done at individual time snapshots, while evaluation considers turbulence taking both spatial and temporal information into account.
  • Figure 2: Compression performance rate-distortion as Peak Signal to Noise Ratio (PSNR) on Compression Rate (CR) on 3 randomly sampled timesteps from 10 trajectories (30 total samples).
  • Figure 3: Physical loss scaling as $\mathcal{L}_{Q}$ (top) and $\mathcal{L}_{\bm{\phi}}$ (bottom) on Compression Rate (log-log). NF and VQ-VAE + EVA are trained with PINC losses. Arrow denotes $\Delta\mathcal{L}$ improvement for VQ-VAE.
  • Figure 4: Energy cascade visualized as the transfer from higher to lower modes as turbulence develops. Plots in loglog scale.
  • Figure 5: 5D swin attention.
  • ...and 7 more figures