Table of Contents
Fetching ...

Fast and Adaptive Bulk Loading of Multidimensional Points

Moin Hussain Moti, Dimitris Papadias

TL;DR

This paper proposes techniques that apply linear scan, and are therefore significantly faster than existing methods for bulk loading disk-based multidimensional points, and develops an adaptive version AMBI, which utilizes the query workload to build a partial index only for parts of the data space that contain query results.

Abstract

Existing methods for bulk loading disk-based multidimensional points involve multiple applications of external sorting. In this paper, we propose techniques that apply linear scan, and are therefore significantly faster. The resulting FMBI Index possesses several desirable properties, including almost full and square nodes with zero overlap, and has excellent query performance. As a second contribution, we develop an adaptive version AMBI, which utilizes the query workload to build a partial index only for parts of the data space that contain query results. Finally, we extend FMBI and AMBI to parallel bulk loading and query processing in distributed systems. An extensive experimental evaluation with real datasets confirms that FMBI and AMBI clearly outperform competitors in terms of combined index construction and query processing cost, sometimes by orders of magnitude.

Fast and Adaptive Bulk Loading of Multidimensional Points

TL;DR

This paper proposes techniques that apply linear scan, and are therefore significantly faster than existing methods for bulk loading disk-based multidimensional points, and develops an adaptive version AMBI, which utilizes the query workload to build a partial index only for parts of the data space that contain query results.

Abstract

Existing methods for bulk loading disk-based multidimensional points involve multiple applications of external sorting. In this paper, we propose techniques that apply linear scan, and are therefore significantly faster. The resulting FMBI Index possesses several desirable properties, including almost full and square nodes with zero overlap, and has excellent query performance. As a second contribution, we develop an adaptive version AMBI, which utilizes the query workload to build a partial index only for parts of the data space that contain query results. Finally, we extend FMBI and AMBI to parallel bulk loading and query processing in distributed systems. An extensive experimental evaluation with real datasets confirms that FMBI and AMBI clearly outperform competitors in terms of combined index construction and query processing cost, sometimes by orders of magnitude.
Paper Structure (11 sections, 11 figures, 1 table, 2 algorithms)

This paper contains 11 sections, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: OSM leaf nodes created by bulk loading methods
  • Figure 2: FMBI Example
  • Figure 3: Minor SplitTrees of $n_3$ and $n_6$
  • Figure 4: FMBI bulk loaded with OSM
  • Figure 5: Adaptive subspace partitions
  • ...and 6 more figures