Multi-Entry Generalized Search Trees for Indexing Trajectories
Maxime Schoemans, Walid G. Aref, Esteban Zimányi, Mahmoud Sakr
TL;DR
This work addresses the inefficiency of single-entry generalized indices for complex objects by introducing MGiST and MSP-GiST, which allow decomposing objects into multiple index entries via a pluggable ExtractValue module. The authors instantiate multi-entry variants of R-Tree, Quad-Tree, and KD-Tree for trajectory data, and evaluate them on synthetic BerlinMOD and real AIS datasets, reporting up to an order of magnitude speedups for point, range, and KNN queries. The approach trades increased index size and insertion complexity for significantly improved query filtering and performance, and generalizes to other composite data types beyond trajectories. The framework preserves the GiST/SP-GiST search semantics while enabling de-duplication and tailored splitting strategies, with open-source implementations for broader adoption.
Abstract
The idea of generalized indices is one of the success stories of database systems research. It has found its way to implementation in common database systems. GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) are two widely-used generalized indices that are typically used for multidimensional data. Currently, the generalized indices GiST and SP-GiST represent one database object using one index entry, e.g., a bounding box for each spatio-temporal object. However, when dealing with complex objects, e.g., moving object trajectories, a single entry per object is inadequate for creating efficient indices. Previous research has highlighted that splitting trajectories into multiple bounding boxes prior to indexing can enhance query performance as it leads to a higher index filter. In this paper, we introduce MGiST and MSP-GiST, the multi-entry generalized search tree counterparts of GiST and SP-GiST, respectively, that are designed to enable the partitioning of objects into multiple entries during insertion. The methods for decomposing a complex object into multiple sub-objects differ from one data type to another, and may depend on some domain-specific parameters. Thus, MGiST and MSP-GiST are designed to allow for pluggable modules that aid in optimizing the split of an object into multiple sub-objects. We demonstrate the usefulness of MGiST and MSP-GiST using a trajectory indexing scenario, where we realize several trajectory indexes using MGiST and MSP-GiST and instantiate these search trees with trajectory-specific splitting algorithms. We create and test the performance of several multi-entry versions of widely-used spatial index structures, e.g., R-Tree, Quad-Tree, and KD-Tree. We conduct evaluations using both synthetic and real-world data, and observe up to an order of magnitude enhancement in performance of point, range, and KNN queries.
