Physics-Informed Neural Compression of High-Dimensional Plasma Data
Gianluca Galletti, Gerald Gutenbrunner Sandeep S. Cranganore, William Hornsby, Lorenzo Zanisi, Naomi Carey, Stanislas Pamela, Johannes Brandstetter, Fabian Paischer
TL;DR
This work tackles the storage burden of high-fidelity, 5D gyrokinetic turbulence data by introducing Physics-Informed Neural Compression (PINC), which enforces gyrokinetics-specific physics losses during compression. It combines two learned representations—a 5D Swin-transformer autoencoder and neural implicit fields—with gyrokinetics-aware losses that preserve integral quantities like $Q$ and $\bm{\phi}$ and turbulence spectra such as $k_y^{\text{spec}}$, while monitoring temporal fidelity via energy-cascade Wasserstein distance and End-Point Error. The authors present a spatiotemporal evaluation pipeline, conduct extensive ablations, and demonstrate compression ratios exceeding $70{,}000\times$, rising to $120{,}000\times$ with entropy coding, all with markedly better physics fidelity than traditional baselines. The approach enables post-hoc analyses that were previously infeasible due to data size, offering a scalable path to preserving physically meaningful information in large-scale plasma simulations. The work provides practical guidance for physics-informed compression and sets a benchmark for future studies in neural compression of turbulent, high-dimensional scientific data.
Abstract
High-fidelity scientific simulations are now producing unprecedented amounts of data, creating a storage and analysis bottleneck. A single simulation can generate tremendous data volumes, often forcing researchers to discard valuable information. A prime example of this is plasma turbulence described by the gyrokinetic equations: nonlinear, multiscale, and 5D in phase space. It constitutes one of the most computationally demanding frontiers of modern science, with runs taking weeks and yielding tens of terabytes of data dumps.The increasing storage demands underscore the importance of compression. However, reconstructed snapshots do not necessarily preserve essential physical quantities. We present a spatiotemporal evaluation pipeline, accounting for structural phenomena and multi-scale transient fluctuations to assess the degree of physical fidelity. Indeed, we find that various compression techniques lack preservation of both spatial mode structure and temporal turbulence characteristics. Therefore, we explore Physics-Informed Neural Compression (PINC), which incorporates physics-informed losses tailored to gyrokinetics and enables extreme compressions ratios of over 70,000x. Entropy coding on top of PINC further pushes it to 120,000x. This direction provides a viable and scalable solution to the prohibitive storage demands of gyrokinetics, enabling post-hoc analyses that were previously infeasible.
