Table of Contents
Fetching ...

CheetahGIS: Architecting a Scalable and Efficient Streaming Spatial Query Processing System

Jiaping Cao, Ting Sun, Man Lung Yiu, Xiao Yan, Bo Tang

TL;DR

This work tackles the challenge of real-time spatial query processing over massive streams of moving objects, requiring low latency and high scalability. It introduces CheetahGIS, a modular streaming system built on Apache Flink Stateful Functions with a grid-based global index (Indexer) and per-cell Local Processors, plus Transformer, Aggregator, Metadata Synchronizer, and Load Balancer to optimize throughput and latency. The paper presents a unified query-processing paradigm and several optimization techniques, including fine-grained resource management, many-to-one Local Processor execution, and adaptive load balancing with an imbalance-remedy heuristic, validated by extensive experiments on real and synthetic datasets. The results demonstrate high throughput and low latency across object, range-count, and kNN queries, with strong robustness to data skew and easy extensibility to user-defined queries, offering a practical solution for scalable, real-time spatial analytics on moving objects.

Abstract

Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.

CheetahGIS: Architecting a Scalable and Efficient Streaming Spatial Query Processing System

TL;DR

This work tackles the challenge of real-time spatial query processing over massive streams of moving objects, requiring low latency and high scalability. It introduces CheetahGIS, a modular streaming system built on Apache Flink Stateful Functions with a grid-based global index (Indexer) and per-cell Local Processors, plus Transformer, Aggregator, Metadata Synchronizer, and Load Balancer to optimize throughput and latency. The paper presents a unified query-processing paradigm and several optimization techniques, including fine-grained resource management, many-to-one Local Processor execution, and adaptive load balancing with an imbalance-remedy heuristic, validated by extensive experiments on real and synthetic datasets. The results demonstrate high throughput and low latency across object, range-count, and kNN queries, with strong robustness to data skew and easy extensibility to user-defined queries, offering a practical solution for scalable, real-time spatial analytics on moving objects.

Abstract

Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.

Paper Structure

This paper contains 26 sections, 1 theorem, 2 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

The imbalance remedy problem (Problem prob:remedy) is NP-hard.

Figures (11)

  • Figure 1: Architecture of $\mathsf{CheetahGIS}$
  • Figure 2: Grid-based index
  • Figure 3: Many-to-one execution mode of Local Processor
  • Figure 4: Imbalance remedy example
  • Figure 5: Query processing in $\mathsf{CheetahGIS}$
  • ...and 6 more figures

Theorems & Definitions (5)

  • Lemma 1
  • proof
  • Definition 1
  • Definition 2
  • Definition 3