Table of Contents
Fetching ...

Optimization Strategies for Parallel Computation of Skylines

Paolo Ciaccia, Davide Martinenghi

TL;DR

This paper proposes two orthogonal optimization strategies for reducing the computational overhead of skyline computation, and compares them experimentally in a multi-core environment equipped with PySpark.

Abstract

Skyline queries are one of the most widely adopted tools for Multi-Criteria Analysis, with applications covering diverse domains, including, e.g., Database Systems, Data Mining, and Decision Making. Skylines indeed offer a useful overview of the most suitable alternatives in a dataset, while discarding all the options that are dominated by (i.e., worse than) others. The intrinsically quadratic complexity associated with skyline computation has pushed researchers to identify strategies for parallelizing the task, particularly by partitioning the dataset at hand. In this paper, after reviewing the main partitioning approaches available in the relevant literature, we propose two orthogonal optimization strategies for reducing the computational overhead, and compare them experimentally in a multi-core environment equipped with PySpark.

Optimization Strategies for Parallel Computation of Skylines

TL;DR

This paper proposes two orthogonal optimization strategies for reducing the computational overhead of skyline computation, and compares them experimentally in a multi-core environment equipped with PySpark.

Abstract

Skyline queries are one of the most widely adopted tools for Multi-Criteria Analysis, with applications covering diverse domains, including, e.g., Database Systems, Data Mining, and Decision Making. Skylines indeed offer a useful overview of the most suitable alternatives in a dataset, while discarding all the options that are dominated by (i.e., worse than) others. The intrinsically quadratic complexity associated with skyline computation has pushed researchers to identify strategies for parallelizing the task, particularly by partitioning the dataset at hand. In this paper, after reviewing the main partitioning approaches available in the relevant literature, we propose two orthogonal optimization strategies for reducing the computational overhead, and compare them experimentally in a multi-core environment equipped with PySpark.

Paper Structure

This paper contains 15 sections, 2 theorems, 10 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Let $r=r_1 \cup \ldots \cup r_p$, with $r_i\cap r_j=\emptyset$ for $i\neq j$. Then ${\normalfont\textsc{Sky}}(r)={\normalfont\textsc{Sky}}({\normalfont\textsc{Sky}}(r_1)\cup\ldots\cup{\normalfont\textsc{Sky}}(r_p))$.

Figures (7)

  • Figure 1: An example dataset of used cars. The filled disks represent tuples in the skyline. The gray area is the dominance region of car C4.
  • Figure 2: Partitioning Strategies. First row: uniform dataset. Second row: anticorrelated dataset.
  • Figure 3: Effectiveness of representative filtering: Sorted and Region strategies.
  • Figure 4: Performance of different partitioning strategies on an anticorrelated dataset with $d=4$ dimensions and varying sizes: execution times for computing the global skyline (\ref{['fig:partitioning-global']}) and the local skylines (\ref{['fig:partitioning-local']}); numbers of tuples in the local skylines (\ref{['fig:cardinalities']})
  • Figure 5: Performance of improved partitioning strategies on ANT and varying sizes: execution times for computing the global skyline with Sliced/Sliced+ (\ref{['fig:partitioning-improvements-sliced']}), Angular/Angular+ (\ref{['fig:partitioning-improvements-angular']}) and all improved algorithms (\ref{['fig:partitioning-all-algos']}).
  • ...and 2 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Example 1
  • Proposition 1
  • Definition 3: Grid dominance
  • Definition 4
  • Proposition 2
  • Definition 5: Weak grid dominance