Table of Contents
Fetching ...

Parallelizing the Computation of Robustness for Measuring the Strength of Tuples

Davide Martinenghi

TL;DR

This work tackles computing grid resistance, a robustness indicator, for skyline tuples by adapting established partitioning-based parallel skyline algorithms. It formalizes skyline dominance, grid projection, and grid-resistance, and analyzes three partitioning strategies (Grid, Angular, Sliced) along with a practical algorithm to compute $\mathsf{gres}$ that repeatedly recomputes skylines on grid-projected data. Through extensive experiments on synthetic and real datasets, it demonstrates that parallelization yields meaningful speedups, with Sliced often providing the most stable performance, and that overly aggressive partitioning or representative filtering offers limited benefits. The results suggest practical guidelines for implementing parallel robustness computations and underscore potential extensions to other dominance-based indicators.

Abstract

Several indicators have been recently proposed for measuring various characteristics of the tuples of a dataset -- particularly, the so-called skyline tuples, i.e., those that are not dominated by other tuples. Numeric indicators are very important as they may, e.g., provide an additional criterion to be used to rank skyline tuples and focus on a subset thereof. We concentrate on an indicator of robustness that may be measured for any skyline tuple $t$: grid resistance, i.e., how large value perturbations can be tolerated for $t$ to remain non-dominated (and thus in the skyline). The computation of this indicator typically involves one or more rounds of computation of the skyline itself or, at least, of dominance relationships. Building on recent advances in partitioning strategies allowing a parallel computation of skylines, we discuss how these strategies can be adapted to the computation of the indicator.

Parallelizing the Computation of Robustness for Measuring the Strength of Tuples

TL;DR

This work tackles computing grid resistance, a robustness indicator, for skyline tuples by adapting established partitioning-based parallel skyline algorithms. It formalizes skyline dominance, grid projection, and grid-resistance, and analyzes three partitioning strategies (Grid, Angular, Sliced) along with a practical algorithm to compute that repeatedly recomputes skylines on grid-projected data. Through extensive experiments on synthetic and real datasets, it demonstrates that parallelization yields meaningful speedups, with Sliced often providing the most stable performance, and that overly aggressive partitioning or representative filtering offers limited benefits. The results suggest practical guidelines for implementing parallel robustness computations and underscore potential extensions to other dominance-based indicators.

Abstract

Several indicators have been recently proposed for measuring various characteristics of the tuples of a dataset -- particularly, the so-called skyline tuples, i.e., those that are not dominated by other tuples. Numeric indicators are very important as they may, e.g., provide an additional criterion to be used to rank skyline tuples and focus on a subset thereof. We concentrate on an indicator of robustness that may be measured for any skyline tuple : grid resistance, i.e., how large value perturbations can be tolerated for to remain non-dominated (and thus in the skyline). The computation of this indicator typically involves one or more rounds of computation of the skyline itself or, at least, of dominance relationships. Building on recent advances in partitioning strategies allowing a parallel computation of skylines, we discuss how these strategies can be adapted to the computation of the indicator.

Paper Structure

This paper contains 8 sections, 3 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Partitioning Strategies illustrated on a uniform dataset.
  • Figure 2: Number of dominance tests incurred by the various partitioning strategies with a default number of partitions ($p=16$) and varying dataset sizes on ANT (\ref{['fig:ant-varyingSize-p16']}) and UNI (\ref{['fig:uni-varyingSize-p16']}).
  • Figure 3: Number of dominance tests with a default number of partitions ($p=16$) as the number of dimensions varies on ANT (\ref{['fig:ant-varyingD-p16']}) and UNI (\ref{['fig:uni-varyingD-p16']}) datasets with $N=1$M tuples.
  • Figure 4: Number of dominance tests as the number of partitions varies on ANT (\ref{['fig:ant-varying-partitions']}) and UNI (\ref{['fig:uni-varying-partitions']}) 3D datasets with $N=1$M tuples.
  • Figure 5: Number of dominance tests with a default number of representatives ($rep=16$) as the number of partitions varies on ANT (\ref{['fig:ant-varying-partitions']}) and UNI (\ref{['fig:uni-varying-partitions']}) 3D datasets with $N=1$M tuples.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3: Grid dominance