Table of Contents
Fetching ...

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

Chen Li, Ye Zhu, Yang Cao, Jinli Zhang, Annisa Annisa, Debo Cheng, Yasuhiko Morimoto

TL;DR

The paper tackles the computational intensity of area skyline queries on map-based big data. It introduces an Apache Spark-based distributed algorithm that employs three key techniques—local partial skyline extraction, driver-side filter creation, and executor-side filtering—to reduce intermediate data and accelerate skyline computations. Empirical results on eight synthetic datasets show substantial reductions in execution time and data volume, with gains increasing as grid sizes and facility counts grow, demonstrating the method's scalability and practical relevance for location-based decision support. The work highlights Spark's suitability for large-scale, multi-criteria spatial queries and points to real-world deployments in spatial decision-making and related domains.

Abstract

The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform, for conducting area skyline computations. The proposed algorithm enhances processing speed and scalability. In particular, our algorithm encompasses three key phases: the computation of distances between data points, the generation of distance tuples, and the execution of the skyline operators. Notably, the second phase employs a local partial skyline extraction technique to minimize the volume of data transmitted from each executor (a parallel processing procedure) to the driver (a central processing procedure). Afterwards, the driver processes the received data to determine the final skyline and creates filters to exclude irrelevant points. Extensive experimentation on eight datasets reveals that our algorithm significantly reduces both data size and computation time required for area skyline computation.

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

TL;DR

The paper tackles the computational intensity of area skyline queries on map-based big data. It introduces an Apache Spark-based distributed algorithm that employs three key techniques—local partial skyline extraction, driver-side filter creation, and executor-side filtering—to reduce intermediate data and accelerate skyline computations. Empirical results on eight synthetic datasets show substantial reductions in execution time and data volume, with gains increasing as grid sizes and facility counts grow, demonstrating the method's scalability and practical relevance for location-based decision support. The work highlights Spark's suitability for large-scale, multi-criteria spatial queries and points to real-world deployments in spatial decision-making and related domains.

Abstract

The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform, for conducting area skyline computations. The proposed algorithm enhances processing speed and scalability. In particular, our algorithm encompasses three key phases: the computation of distances between data points, the generation of distance tuples, and the execution of the skyline operators. Notably, the second phase employs a local partial skyline extraction technique to minimize the volume of data transmitted from each executor (a parallel processing procedure) to the driver (a central processing procedure). Afterwards, the driver processes the received data to determine the final skyline and creates filters to exclude irrelevant points. Extensive experimentation on eight datasets reveals that our algorithm significantly reduces both data size and computation time required for area skyline computation.
Paper Structure (21 sections, 7 figures, 4 tables)

This paper contains 21 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Example of area skyline in a map.
  • Figure 2: An example of a distributed algorithm for area skyline computation.
  • Figure 3: Overview architecture of the proposed Apache Spark-based area skyline computation algorithm.
  • Figure 4: An example of local partial skylines and their dominant areas.
  • Figure 5: Relationship between the number of grids and facilities and the execution time.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2