Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

Chen Li; Ye Zhu; Yang Cao; Jinli Zhang; Annisa Annisa; Debo Cheng; Yasuhiko Morimoto

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

Chen Li, Ye Zhu, Yang Cao, Jinli Zhang, Annisa Annisa, Debo Cheng, Yasuhiko Morimoto

TL;DR

The paper tackles the computational intensity of area skyline queries on map-based big data. It introduces an Apache Spark-based distributed algorithm that employs three key techniques—local partial skyline extraction, driver-side filter creation, and executor-side filtering—to reduce intermediate data and accelerate skyline computations. Empirical results on eight synthetic datasets show substantial reductions in execution time and data volume, with gains increasing as grid sizes and facility counts grow, demonstrating the method's scalability and practical relevance for location-based decision support. The work highlights Spark's suitability for large-scale, multi-criteria spatial queries and points to real-world deployments in spatial decision-making and related domains.

Abstract

The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform, for conducting area skyline computations. The proposed algorithm enhances processing speed and scalability. In particular, our algorithm encompasses three key phases: the computation of distances between data points, the generation of distance tuples, and the execution of the skyline operators. Notably, the second phase employs a local partial skyline extraction technique to minimize the volume of data transmitted from each executor (a parallel processing procedure) to the driver (a central processing procedure). Afterwards, the driver processes the received data to determine the final skyline and creates filters to exclude irrelevant points. Extensive experimentation on eight datasets reveals that our algorithm significantly reduces both data size and computation time required for area skyline computation.

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

TL;DR

Abstract

Paper Structure (21 sections, 7 figures, 4 tables)

This paper contains 21 sections, 7 figures, 4 tables.

Introduction
Related Work
Skyline Query
Skyline Processing in Distributed Framework
Area Skyline Computation with Hadoop MapReduce
Preliminary
Area Skyline Computation
Distributed MapReduce Algorithm for Area Skyline Computation
Area Skyline Computation with Apache Spark Framework
Local Partial Skyline Extraction
Filter Creation at Driver
Filtering in Each Executor
Experiments
Experimental Configuration
Evaluation Datasets
...and 6 more sections

Figures (7)

Figure 1: Example of area skyline in a map.
Figure 2: An example of a distributed algorithm for area skyline computation.
Figure 3: Overview architecture of the proposed Apache Spark-based area skyline computation algorithm.
Figure 4: An example of local partial skylines and their dominant areas.
Figure 5: Relationship between the number of grids and facilities and the execution time.
...and 2 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

TL;DR

Abstract

Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)