Table of Contents
Fetching ...

Tradeoffs in Processing Queries and Supporting Updates over an ML-Enhanced R-tree

Abdullah Al-Mamun, Ch. Md. Rakin Haider, Jianguo Wang, Walid G. Aref

TL;DR

This work introduces the AI+R-tree, a hybrid index that augments the traditional disk-based R-tree with learned components to predict the set of leaf nodes containing query results, reducing unnecessary leaf visits for high-overlap range queries. It builds the AI-tree by learning a multi-model ensemble over a grid of leaf-node groups, guided by an overlap ratio $\alpha$ and a threshold $\tau$, and complements it with a binary Overlap Ratio classifier to route queries between AI-tree and R-tree. A key contribution is a detailed design and evaluation of insert/update/delete strategies (in-place vs out-of-place), along with a custom NN loss that accounts for data-object contributions to query recall, enabling a mutable AI+R-tree. Experiments on Tweets, Gowalla, and Chicago Crimes datasets show up to 5.4X improvements in average query processing time and up to 99% recall, with modest ML overhead, demonstrating practical gains for dynamic spatial workloads and highlighting tradeoffs across DT-based and NN-based classifiers.

Abstract

Machine Learning (ML) techniques have been successfully applied to design various learned database index structures for both the one- and multi-dimensional spaces. Particularly, a class of traditional multi-dimensional indexes has been augmented with ML models to design ML-enhanced variants of their traditional counterparts. This paper focuses on the R-tree multi-dimensional index structure as it is widely used for indexing multi-dimensional data. The R-tree has been augmented with machine learning models to enhance the R-tree performance. The AI+R-tree is an ML-enhanced R-tree index structure that augments a traditional disk-based R-tree with an ML model to enhance the R-tree's query processing performance, mainly, to avoid navigating the overlapping branches of the R-tree that do not yield query results, e.g., in the presence of high-overlap among the rectangles of the R-tree nodes. We investigate the empirical tradeoffs in processing dynamic query workloads and in supporting updates over the AI+R-tree. Particularly, we investigate the impact of the choice of ML models over the AI+R-tree query processing performance. Moreover, we present a case study of designing a custom loss function for a neural network model tailored to the query processing requirements of the AI+R-tree. Furthermore, we present the design tradeoffs for adopting various strategies for supporting dynamic inserts, updates, and deletes with the vision of realizing a mutable AI+R-tree. Experiments on real datasets demonstrate that the AI+R-tree can enhance the query processing performance of a traditional R-tree for high-overlap range queries by up to 5.4X while achieving up to 99% average query recall.

Tradeoffs in Processing Queries and Supporting Updates over an ML-Enhanced R-tree

TL;DR

This work introduces the AI+R-tree, a hybrid index that augments the traditional disk-based R-tree with learned components to predict the set of leaf nodes containing query results, reducing unnecessary leaf visits for high-overlap range queries. It builds the AI-tree by learning a multi-model ensemble over a grid of leaf-node groups, guided by an overlap ratio and a threshold , and complements it with a binary Overlap Ratio classifier to route queries between AI-tree and R-tree. A key contribution is a detailed design and evaluation of insert/update/delete strategies (in-place vs out-of-place), along with a custom NN loss that accounts for data-object contributions to query recall, enabling a mutable AI+R-tree. Experiments on Tweets, Gowalla, and Chicago Crimes datasets show up to 5.4X improvements in average query processing time and up to 99% recall, with modest ML overhead, demonstrating practical gains for dynamic spatial workloads and highlighting tradeoffs across DT-based and NN-based classifiers.

Abstract

Machine Learning (ML) techniques have been successfully applied to design various learned database index structures for both the one- and multi-dimensional spaces. Particularly, a class of traditional multi-dimensional indexes has been augmented with ML models to design ML-enhanced variants of their traditional counterparts. This paper focuses on the R-tree multi-dimensional index structure as it is widely used for indexing multi-dimensional data. The R-tree has been augmented with machine learning models to enhance the R-tree performance. The AI+R-tree is an ML-enhanced R-tree index structure that augments a traditional disk-based R-tree with an ML model to enhance the R-tree's query processing performance, mainly, to avoid navigating the overlapping branches of the R-tree that do not yield query results, e.g., in the presence of high-overlap among the rectangles of the R-tree nodes. We investigate the empirical tradeoffs in processing dynamic query workloads and in supporting updates over the AI+R-tree. Particularly, we investigate the impact of the choice of ML models over the AI+R-tree query processing performance. Moreover, we present a case study of designing a custom loss function for a neural network model tailored to the query processing requirements of the AI+R-tree. Furthermore, we present the design tradeoffs for adopting various strategies for supporting dynamic inserts, updates, and deletes with the vision of realizing a mutable AI+R-tree. Experiments on real datasets demonstrate that the AI+R-tree can enhance the query processing performance of a traditional R-tree for high-overlap range queries by up to 5.4X while achieving up to 99% average query recall.

Paper Structure

This paper contains 62 sections, 1 equation, 19 figures, 3 tables.

Figures (19)

  • Figure 1: An example of an R-tree with overlapping nodes
  • Figure 2: Spectrum of the overlap ratio $\alpha$ with Threshold $\tau$ to identify high- and low-overlap queries
  • Figure 3: The AI+R-tree
  • Figure 4: Workflow of ML model training and testing
  • Figure 5: Indexing the learned models
  • ...and 14 more figures