SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

Pu Ren; N. Benjamin Erichson; Junyi Guo; Shashank Subramanian; Omer San; Zarija Lukic; Michael W. Mahoney

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

Pu Ren, N. Benjamin Erichson, Junyi Guo, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney

TL;DR

SuperBench introduces the first standardized, high-resolution SR benchmark for Scientific ML, combining fluid turbulence (NSKT), cosmology hydrodynamics (Nyx), and ERA5 weather data up to $2048\times2048$ with ~439 GB total. It defines realistic degradation modes (bicubic, uniform down-sampling with noise, LR simulations) and evaluates SR methods with pixel-level, perceptual, and domain-specific physics metrics, highlighting challenges in preserving fundamental laws when using purely data-driven approaches. Baselines including SwinIR, FNO, and physics-constrained variants reveal that incorporating domain knowledge (e.g., continuity constraints, energy spectrum alignment) improves physical fidelity over pure pixel accuracy. The dataset and evaluation framework are openly hosted and designed to extend to temporal/spatiotemporal and 3D SR, aiming to accelerate science-driven SR research while encouraging responsible use and reproducibility.

Abstract

Super-resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts in weather, climate, and related areas. However, the lack of standardized benchmark datasets for comparing and validating SR methods hinders progress and adoption in SciML. To address this, we introduce SuperBench, the first benchmark dataset featuring high-resolution datasets, including data from fluid flows, cosmology, and weather. Here, we focus on validating spatial SR performance from data-centric and physics-preserved perspectives, as well as assessing robustness to data degradation tasks. While deep learning-based SR methods (developed in the computer vision community) excel on certain tasks, despite relatively limited prior physics information, we identify limitations of these methods in accurately capturing intricate fine-scale features and preserving fundamental physical properties and constraints in scientific data. These shortcomings highlight the importance and subtlety of incorporating domain knowledge into ML models. We anticipate that SuperBench will help to advance SR methods for science.

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

TL;DR

SuperBench introduces the first standardized, high-resolution SR benchmark for Scientific ML, combining fluid turbulence (NSKT), cosmology hydrodynamics (Nyx), and ERA5 weather data up to

with ~439 GB total. It defines realistic degradation modes (bicubic, uniform down-sampling with noise, LR simulations) and evaluates SR methods with pixel-level, perceptual, and domain-specific physics metrics, highlighting challenges in preserving fundamental laws when using purely data-driven approaches. Baselines including SwinIR, FNO, and physics-constrained variants reveal that incorporating domain knowledge (e.g., continuity constraints, energy spectrum alignment) improves physical fidelity over pure pixel accuracy. The dataset and evaluation framework are openly hosted and designed to extend to temporal/spatiotemporal and 3D SR, aiming to accelerate science-driven SR research while encouraging responsible use and reproducibility.

Abstract

Paper Structure (58 sections, 4 equations, 17 figures, 14 tables)

This paper contains 58 sections, 4 equations, 17 figures, 14 tables.

Introduction
Related Work
Description of SuperBench
Datasets
Navier-Stokes Kraichnan Turbulence (NSKT) Fluid Flows
Data.
Cosmology Hydrodynamics
Data.
Weather
Data.
Data Preprocessing
Evaluation Metrics
Pixel-level difference.
Human-level perception.
Domain-motivated error metrics.
...and 43 more sections

Figures (17)

Figure 1: High-resolution data are paramount to accurately resolving the turbulent dynamics of Earth's weather systems. For instance, resolving storms requires kilometer-scale resolutions, and some crucial climate processes can require order of 1m resolutions. The snapshots on the left show coarse-grained data that can be thought of as a down-sampled representation of the fine-scale data on the right. Coarse-grained data not only fail to capture the small scales, but they also do not account for the impact of these small scales on the large-scale dynamics, nor the impact of fine (and critical) topographic features such as mountain ranges on either scale. Currently, generating these high-resolution and accurate data demands prohibitive computational resources (thousands of nodes on modern super-computing substrates). Based on current computing trends, it may be several decades before numerical solvers of atmospheric physics can simulate at a meter resolution Schneider2017, which represents a grand challenge to scientific computing. Using SR to resolve fine-scale structures from coarser simulations holds an enormous promise towards fast, efficient, and accurate models for atmospheric physics emulation.
Figure 2: A cropped example snapshot of weather data. The task is to recover the HR representation from the corresponding LR input by a factor of $\times 16$. All SOTA methods reconstruct a blurred approximation that washes out important multi-scale and fine-scale features of physical importance.
Figure 3: High-resolution example snapshots included in SuperBench, showing a Navier-Stokes Kraichnan Turbulence fluid flow (left), weather data that are comprised of several atmospheric variables (middle), and simulated cosmology hydrodynamics data (right).
Figure 4: The results of RFNE versus model parameters on four datasets considering scenario (i) with up-sampling factors $\times8$ and $\times16$.
Figure 5: Comparative results of different degradation methods. (a) exhibits the RFNE results from scenarios (i) and (ii) using the SwinIR model, with up-sampling factors $\times8$. (b-d) show the results of scenario (iii) with LR simulation data as inputs.
...and 12 more figures

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

TL;DR

Abstract

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (17)