Table of Contents
Fetching ...

Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis

Qian Gong, Chengzhu Zhang, Xin Liang, Viktor Reshniak, Jieyang Chen, Anand Rangarajan, Sanjay Ranka, Nicolas Vidal, Lipeng Wan, Paul Ullrich, Norbert Podhorszki, Robert Jacob, Scott Klasky

TL;DR

The paper tackles the storage bottleneck of high-resolution time-series simulations by introducing a spatiotemporal adaptive, error-bounded lossy compression framework that preserves key QoIs. Built on MGARD with non-uniform error bounding and a region-aware buffer-zone strategy, it enables trading precision for temporal resolution and region-specific fidelity to improve downstream analyses such as Tropical Cyclone tracking in E3SM climate data. The authors demonstrate that higher temporal frequency with precision reduction better preserves QoIs than simple timestep decimation, and that region-adaptive compression yields substantial gains in tracking accuracy at large compression ratios with modest overhead. This approach offers a practical path to significantly reduce data sizes while maintaining or enhancing the quality of climate-analysis pipelines that rely on spatiotemporal features.

Abstract

Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.

Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis

TL;DR

The paper tackles the storage bottleneck of high-resolution time-series simulations by introducing a spatiotemporal adaptive, error-bounded lossy compression framework that preserves key QoIs. Built on MGARD with non-uniform error bounding and a region-aware buffer-zone strategy, it enables trading precision for temporal resolution and region-specific fidelity to improve downstream analyses such as Tropical Cyclone tracking in E3SM climate data. The authors demonstrate that higher temporal frequency with precision reduction better preserves QoIs than simple timestep decimation, and that region-adaptive compression yields substantial gains in tracking accuracy at large compression ratios with modest overhead. This approach offers a practical path to significantly reduce data sizes while maintaining or enhancing the quality of climate-analysis pipelines that rely on spatiotemporal features.

Abstract

Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.
Paper Structure (15 sections, 2 equations, 13 figures)

This paper contains 15 sections, 2 equations, 13 figures.

Figures (13)

  • Figure 1: Frequency distributions of precipitation intensity over two selected grid points within (a) Oklahoma and (b) Alaska using 15-minute and 3-hourly E3SM output.
  • Figure 2: Example Tropical Cyclone (TC) tracks detected in the hourly, hourly lossy compressed, and temporally decimated (6-hourly) data. With a higher temporal resolution, lossy compression causes less errors on the detected TC trajectory than data reduced by temporal decimation.
  • Figure 3: Compression ratios achieved on 4 variables outputted from E3SM simulation at every 15-minute, hourly, and 6 hourly rate when buffer 120 timesteps of data in memory and compress them at once, using a relative RMSE of 1.0e-3.
  • Figure 4: The propagation of compression error induced by a quantization error at a node on a coarser level after the multilevel recomposition in 1 d, 2 d, and 3 d space.
  • Figure 5: An illustration of the proposed two-step mesh refinement approach for critical region detection.
  • ...and 8 more figures