Table of Contents
Fetching ...

FlexFlood: Efficiently Updatable Learned Multi-dimensional Index

Fuma Hidaka, Yusuke Matsui

TL;DR

FlexFlood is the first learned multi-dimensional index that guarantees the time complexity of the update operation, and is the first learned multi-dimensional index that guarantees the time complexity of the update operation.

Abstract

A learned multi-dimensional index is a data structure that efficiently answers multi-dimensional orthogonal queries by understanding the data distribution using machine learning models. One of the existing problems is that the search performance significantly decreases when the distribution of data stored in the data structure becomes skewed due to update operations. To overcome this problem, we propose FlexFlood, a flexible variant of Flood. FlexFlood partially reconstructs the internal structure when the data distribution becomes skewed. Moreover, FlexFlood is the first learned multi-dimensional index that guarantees the time complexity of the update operation. Through experiments using both artificial and real-world data, we demonstrate that the search performance when the data distribution becomes skewed is up to 10 times faster than existing methods. We also found that partial reconstruction takes only about twice as much time as naive data updating.

FlexFlood: Efficiently Updatable Learned Multi-dimensional Index

TL;DR

FlexFlood is the first learned multi-dimensional index that guarantees the time complexity of the update operation, and is the first learned multi-dimensional index that guarantees the time complexity of the update operation.

Abstract

A learned multi-dimensional index is a data structure that efficiently answers multi-dimensional orthogonal queries by understanding the data distribution using machine learning models. One of the existing problems is that the search performance significantly decreases when the distribution of data stored in the data structure becomes skewed due to update operations. To overcome this problem, we propose FlexFlood, a flexible variant of Flood. FlexFlood partially reconstructs the internal structure when the data distribution becomes skewed. Moreover, FlexFlood is the first learned multi-dimensional index that guarantees the time complexity of the update operation. Through experiments using both artificial and real-world data, we demonstrate that the search performance when the data distribution becomes skewed is up to 10 times faster than existing methods. We also found that partial reconstruction takes only about twice as much time as naive data updating.

Paper Structure

This paper contains 21 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of our method ($D = 2$): Even if $N = 27$ data are equally divided into $x_1 = 3$ cells at the initialization, the distribution can become skewed due to data updating, and the updatable Flood slows down. We aim to ensure the search performance by partially re-partitioning the cells.
  • Figure 2: Experimental results: The upper panel shows the update queries, and the lower panel shows the results for the search queries. (Lower is better.)
  • Figure 3: Processing time per search query.
  • Figure 4: Heatmaps show percentage changes compared to experimental results in Section \ref{['sec:Reslut']}. Blue squares mean better performance.
  • Figure 5: Overview of "split" ($D = 3, x_1 = 8, x_2 = 3, X = 24, N = 48$): The cells outlined in red are subject to "split" because they contain 13 data in total. This value is larger than the "split" condition: $2 \cdot \frac{N}{x_1} = 2 \cdot \frac{48}{8} = 12$. We insert $\frac{X}{x_1} = \frac{24}{8} = 3$ new cells (B-trees), and distribute the 12 data points evenly between the old and new cells, with 6 points in each.
  • ...and 1 more figures