Table of Contents
Fetching ...

Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest

Xuan Wu, Qian Gong, Jieyang Chen, Qing Liu, Norbert Podhorszki, Xin Liang, Scott Klasky

TL;DR

A progressive data retrieval framework with guaranteed error control on derivable QoIs, which leads to over $2.02 performance gain in data transfer tasks compared to transferring the primary data while guaranteeing a QoI error that is less than 1E-5.

Abstract

The unprecedented amount of scientific data has introduced heavy pressure on the current data storage and transmission systems. Progressive compression has been proposed to mitigate this problem, which offers data access with on-demand precision. However, existing approaches only consider precision control on primary data, leaving uncertainties on the quantities of interest (QoIs) derived from it. In this work, we present a progressive data retrieval framework with guaranteed error control on derivable QoIs. Our contributions are three-fold. (1) We carefully derive the theories to strictly control QoI errors during progressive retrieval. Our theory is generic and can be applied to any QoIs that can be composited by the basis of derivable QoIs proved in the paper. (2) We design and develop a generic progressive retrieval framework based on the proposed theories, and optimize it by exploring feasible progressive representations. (3) We evaluate our framework using five real-world datasets with a diverse set of QoIs. Experiments demonstrate that our framework can faithfully respect any user-specified QoI error bounds in the evaluated applications. This leads to over 2.02x performance gain in data transfer tasks compared to transferring the primary data while guaranteeing a QoI error that is less than 1E-5.

Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest

TL;DR

A progressive data retrieval framework with guaranteed error control on derivable QoIs, which leads to over $2.02 performance gain in data transfer tasks compared to transferring the primary data while guaranteeing a QoI error that is less than 1E-5.

Abstract

The unprecedented amount of scientific data has introduced heavy pressure on the current data storage and transmission systems. Progressive compression has been proposed to mitigate this problem, which offers data access with on-demand precision. However, existing approaches only consider precision control on primary data, leaving uncertainties on the quantities of interest (QoIs) derived from it. In this work, we present a progressive data retrieval framework with guaranteed error control on derivable QoIs. Our contributions are three-fold. (1) We carefully derive the theories to strictly control QoI errors during progressive retrieval. Our theory is generic and can be applied to any QoIs that can be composited by the basis of derivable QoIs proved in the paper. (2) We design and develop a generic progressive retrieval framework based on the proposed theories, and optimize it by exploring feasible progressive representations. (3) We evaluate our framework using five real-world datasets with a diverse set of QoIs. Experiments demonstrate that our framework can faithfully respect any user-specified QoI error bounds in the evaluated applications. This leads to over 2.02x performance gain in data transfer tasks compared to transferring the primary data while guaranteeing a QoI error that is less than 1E-5.

Paper Structure

This paper contains 22 sections, 11 theorems, 1 equation, 9 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

[Polynomials] An upper bound of $\Delta(f, x, \epsilon)$ for function $f(x) = x^n$ can be written as $\Delta(f, x, \epsilon) \leq \sum_{i=1}^{n}{C_n^i \lvert x\rvert^{n-i}\epsilon^i}$, where $C_n^i=\frac{n!}{(n-i)!i!}$ is the combination formula.

Figures (9)

  • Figure 1: Workflow of the proposed QoI-preserving framework with three key modules. We assume that data is refactored and stored in storage systems when generated, and our framework is able to progressively retrieve data from storage while guaranteeing user-specified QoI error bounds. This is extremely useful when data movement becomes the performance bottleneck, which is the case when data is located in secondary or remote storage systems.
  • Figure 2: Requested error and the resulting bitrate for different error-controlled progressive compressors.
  • Figure 3: Impact of decomposition basis on GE-small data. OB and HB represent PMGARD and PMGARD-HB, respectively.
  • Figure 4: Max estimated and max actual QoI errors under given requested QoI errors of PMGARD-HB on GE-small.
  • Figure 5: Max estimated and max actual QoI errors under given requested QoI errors of PMGARD-HB on NYX and Hurricane.
  • ...and 4 more figures

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Definition 5
  • Theorem 4
  • Theorem 5
  • ...and 6 more