Table of Contents
Fetching ...

WaZI: A Learned and Workload-aware Z-Index

Sachith Pai, Michael Mathioudakis, Yanhao Wang

TL;DR

WaZI addresses efficient 2D spatial range queries by making the Z-index workload-aware and data-aware, combining adaptive layout with learned density estimates to minimize point retrieval during queries. It extends the base Z-index by allowing per-node flexible partitioning and two valid child-orderings, and introduces a page-skipping mechanism via look-ahead pointers to prune irrelevant pages. The core contributions include a formal retrieval-cost objective, a greedy index-construction algorithm, and a comprehensive evaluation showing up to around 40% faster range queries with competitive build-time and index size. WaZI remains effective under real-world skewed workloads and presents a practical trade-off between latency, construction cost, and space, with notable ablations confirming the value of each component.

Abstract

Learned indexes fit machine learning (ML) models to the data and use them to make query operations more time and space-efficient. Recent works propose using learned spatial indexes to improve spatial query performance by optimizing the storage layout or internal search structures according to the data distribution. However, only a few learned indexes exploit the query workload distribution to enhance their performance. In addition, building and updating learned spatial indexes are often costly on large datasets due to the inefficiency of (re)training ML models. In this paper, we present WaZI, a learned and workload-aware variant of the Z-index, which jointly optimizes the storage layout and search structures, as a viable solution for the above challenges of spatial indexing. Specifically, we first formulate a cost function to measure the performance of a Z-index on a dataset for a range-query workload. Then, we optimize the Z-index structure by minimizing the cost function through adaptive partitioning and ordering for index construction. Moreover, we design a novel page-skipping mechanism to improve the query performance of WaZI by reducing access to irrelevant data pages. Our extensive experiments show that the WaZI index improves range query time by 40% on average over the baselines while always performing better or comparably to state-of-the-art spatial indexes. Additionally, it also maintains good point query performance. Generally, WaZI provides favorable tradeoffs among query latency, construction time, and index size.

WaZI: A Learned and Workload-aware Z-Index

TL;DR

WaZI addresses efficient 2D spatial range queries by making the Z-index workload-aware and data-aware, combining adaptive layout with learned density estimates to minimize point retrieval during queries. It extends the base Z-index by allowing per-node flexible partitioning and two valid child-orderings, and introduces a page-skipping mechanism via look-ahead pointers to prune irrelevant pages. The core contributions include a formal retrieval-cost objective, a greedy index-construction algorithm, and a comprehensive evaluation showing up to around 40% faster range queries with competitive build-time and index size. WaZI remains effective under real-world skewed workloads and presents a practical trade-off between latency, construction cost, and space, with notable ablations confirming the value of each component.

Abstract

Learned indexes fit machine learning (ML) models to the data and use them to make query operations more time and space-efficient. Recent works propose using learned spatial indexes to improve spatial query performance by optimizing the storage layout or internal search structures according to the data distribution. However, only a few learned indexes exploit the query workload distribution to enhance their performance. In addition, building and updating learned spatial indexes are often costly on large datasets due to the inefficiency of (re)training ML models. In this paper, we present WaZI, a learned and workload-aware variant of the Z-index, which jointly optimizes the storage layout and search structures, as a viable solution for the above challenges of spatial indexing. Specifically, we first formulate a cost function to measure the performance of a Z-index on a dataset for a range-query workload. Then, we optimize the Z-index structure by minimizing the cost function through adaptive partitioning and ordering for index construction. Moreover, we design a novel page-skipping mechanism to improve the query performance of WaZI by reducing access to irrelevant data pages. Our extensive experiments show that the WaZI index improves range query time by 40% on average over the baselines while always performing better or comparably to state-of-the-art spatial indexes. Additionally, it also maintains good point query performance. Generally, WaZI provides favorable tradeoffs among query latency, construction time, and index size.
Paper Structure (21 sections, 5 equations, 13 figures, 5 tables, 4 algorithms)

This paper contains 21 sections, 5 equations, 13 figures, 5 tables, 4 algorithms.

Figures (13)

  • Figure 1: Illustrations of the base Z-curve and Z-index and their variants proposed in this work.
  • Figure 2: Illustration of the quaternary tree structure of a Z-index. Each leaf node holds a pointer to its subsequent leaf node as well as a pointer to the page containing its corresponding data points.
  • Figure 3: Illustration of skipping during range query processing; (a) Standard range query processing of range query $R$ (red) processes pages in range $[a:m]$; (b) The four different irrelevancy criteria explained. (c) Motivating example for efficient skipping. As we process page $b$, we know that it does not overlap the query because of Below. We also know that the next page in the sort-order that may satisfy the criterion is $d$. Similarly, at page $d$ we can skip ahead to page $j$, saving the computation required to process pages $e$ through $i$;
  • Figure 4: Average range query performance of all indexes considered in the experiments.
  • Figure 5: Datasets and query workloads in the experiments.
  • ...and 8 more figures