Table of Contents
Fetching ...

Updatable Balanced Index for Fast On-device Search with Auto-selection Model

Yushuai Ji, Sheng Wang, Zhiyu Chen, Yuan Sun, Zhiyong Peng

TL;DR

UnIS tackles the core bottlenecks of on-device kNN/radius search by combining an updatable balanced index with predictive partitioning, selective sub-tree rebuilding, and an auto-selected search strategy. It replaces costly sorting in BMKD-tree construction with a data-driven pivot prediction and AEPL-guided partitioning, while enabling real-time insertion through selective rebuilding and incremental model updates. An auto-selection model leverages enriched query features, including a graph-based F2 component, to choose the fastest search strategy per query, boosting practical on-device performance. Empirical results demonstrate substantial speedups in index construction, insertion, and kNN search, and notable gains in edge-based k-means acceleration, indicating strong applicability for real-time analytics on resource-constrained devices.

Abstract

Diverse types of edge data, such as 2D geo-locations and 3D point clouds, are collected by sensors like lidar and GPS receivers on edge devices. On-device searches, such as k-nearest neighbor (kNN) search and radius search, are commonly used to enable fast analytics and learning technologies, such as k-means dataset simplification using kNN. To maintain high search efficiency, a representative approach is to utilize a balanced multi-way KD-tree (BMKD-tree). However, the index has shown limited gains, mainly due to substantial construction overhead, inflexibility to real-time insertion, and inconsistent query performance. In this paper, we propose UnIS to address the above limitations. We first accelerate the construction process of the BMKD-tree by utilizing the dataset distribution to predict the splitting hyperplanes. To make the continuously generated data searchable, we propose a selective sub-tree rebuilding scheme to accelerate rebalancing during insertion by reducing the number of data points involved. We then propose an auto-selection model to improve query performance by automatically selecting the optimal search strategy among multiple strategies for an arbitrary query task. Experimental results show that UnIS achieves average speedups of 17.96x in index construction, 1.60x in insertion, 7.15x in kNN search, and 1.09x in radius search compared to the BMKD-tree. We further verify its effectiveness in accelerating dataset simplification on edge devices, achieving a speedup of 217x over Lloyd's algorithm.

Updatable Balanced Index for Fast On-device Search with Auto-selection Model

TL;DR

UnIS tackles the core bottlenecks of on-device kNN/radius search by combining an updatable balanced index with predictive partitioning, selective sub-tree rebuilding, and an auto-selected search strategy. It replaces costly sorting in BMKD-tree construction with a data-driven pivot prediction and AEPL-guided partitioning, while enabling real-time insertion through selective rebuilding and incremental model updates. An auto-selection model leverages enriched query features, including a graph-based F2 component, to choose the fastest search strategy per query, boosting practical on-device performance. Empirical results demonstrate substantial speedups in index construction, insertion, and kNN search, and notable gains in edge-based k-means acceleration, indicating strong applicability for real-time analytics on resource-constrained devices.

Abstract

Diverse types of edge data, such as 2D geo-locations and 3D point clouds, are collected by sensors like lidar and GPS receivers on edge devices. On-device searches, such as k-nearest neighbor (kNN) search and radius search, are commonly used to enable fast analytics and learning technologies, such as k-means dataset simplification using kNN. To maintain high search efficiency, a representative approach is to utilize a balanced multi-way KD-tree (BMKD-tree). However, the index has shown limited gains, mainly due to substantial construction overhead, inflexibility to real-time insertion, and inconsistent query performance. In this paper, we propose UnIS to address the above limitations. We first accelerate the construction process of the BMKD-tree by utilizing the dataset distribution to predict the splitting hyperplanes. To make the continuously generated data searchable, we propose a selective sub-tree rebuilding scheme to accelerate rebalancing during insertion by reducing the number of data points involved. We then propose an auto-selection model to improve query performance by automatically selecting the optimal search strategy among multiple strategies for an arbitrary query task. Experimental results show that UnIS achieves average speedups of 17.96x in index construction, 1.60x in insertion, 7.15x in kNN search, and 1.09x in radius search compared to the BMKD-tree. We further verify its effectiveness in accelerating dataset simplification on edge devices, achieving a speedup of 217x over Lloyd's algorithm.

Paper Structure

This paper contains 31 sections, 3 theorems, 18 equations, 14 figures, 10 tables, 6 algorithms.

Key Result

Lemma 1

(Rectangle-Rectangle Pruning) Given a dataset $\mathbf{D}=\{\mathbf{p}_i\}_{i=1}^{n} \in \mathbb{R}^{n \times d}$, a subset $\mathbf{P}\subseteq \mathbf{D}$, which is compressed by the MBR, $R_1= (\mathbf{bp}, \mathbf{tp})$, where $\mathbf{tp}=(tp _{1},tp _{2},\cdots,tp_{d}) \in \mathbb{R}^{d}$ and then the two MBRs intersect.

Figures (14)

  • Figure 1: Performance comparisons of the balanced multi-way KD-tree.
  • Figure 2: A balanced multi-way KD-tree with a partition number of $t=3$.
  • Figure 3: Impact of $t$ values on AEPL with leaf node capacity $c = 3$. Notably, in (a), since the data points in each subtree after the first partition do not fall below $c$, a second partitioning is required, represented by the dashed lines.
  • Figure 4: Impact of different values of $t$ on AEPL and $k$NN efficiency (100 tasks) with leaf capacity $c=30$ on ArgoPC.
  • Figure 5: An example of the rebalance process.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 7
  • Lemma 1
  • Lemma 2
  • ...and 5 more