CheetahGIS: Architecting a Scalable and Efficient Streaming Spatial Query Processing System
Jiaping Cao, Ting Sun, Man Lung Yiu, Xiao Yan, Bo Tang
TL;DR
This work tackles the challenge of real-time spatial query processing over massive streams of moving objects, requiring low latency and high scalability. It introduces CheetahGIS, a modular streaming system built on Apache Flink Stateful Functions with a grid-based global index (Indexer) and per-cell Local Processors, plus Transformer, Aggregator, Metadata Synchronizer, and Load Balancer to optimize throughput and latency. The paper presents a unified query-processing paradigm and several optimization techniques, including fine-grained resource management, many-to-one Local Processor execution, and adaptive load balancing with an imbalance-remedy heuristic, validated by extensive experiments on real and synthetic datasets. The results demonstrate high throughput and low latency across object, range-count, and kNN queries, with strong robustness to data skew and easy extensibility to user-defined queries, offering a practical solution for scalable, real-time spatial analytics on moving objects.
Abstract
Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.
