Table of Contents
Fetching ...

Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures

Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette

TL;DR

The paper tackles scalable $k$-NN search over massive IoT data by addressing overlap in tree-based partitions. It introduces three overlap-quantifying heuristics—Volume-Based Method ($VBM$), Distance-Based Method ($DBM$), and Object-Based Method ($OBM$)—and a preprocessing step using DBSCAN to enable overlap-aware index construction. Through simulations on Tracking and WARD datasets, $VBM$ consistently delivers balanced structures and fastest search performance, outperforming the baseline BCCF-tree in construction cost and query efficiency. The work advances real-time IoT data indexing by enabling scalable, low-overhead $k$-NN retrieval with adaptive overlap management.

Abstract

The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.

Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures

TL;DR

The paper tackles scalable -NN search over massive IoT data by addressing overlap in tree-based partitions. It introduces three overlap-quantifying heuristics—Volume-Based Method (), Distance-Based Method (), and Object-Based Method ()—and a preprocessing step using DBSCAN to enable overlap-aware index construction. Through simulations on Tracking and WARD datasets, consistently delivers balanced structures and fastest search performance, outperforming the baseline BCCF-tree in construction cost and query efficiency. The work advances real-time IoT data indexing by enabling scalable, low-overhead -NN retrieval with adaptive overlap management.

Abstract

The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.
Paper Structure (18 sections, 15 equations, 21 figures, 1 table, 2 algorithms)

This paper contains 18 sections, 15 equations, 21 figures, 1 table, 2 algorithms.

Figures (21)

  • Figure 1: Growth of Connected Devices and Data Volume (2015-2025).
  • Figure 2: An example of overlapping partitions.
  • Figure 3: Geometry of the intersection of two hyperballs.
  • Figure 4: An example of a spherical cap.
  • Figure 5: Bucket data distribution using DBM for Tracking datasets.
  • ...and 16 more figures

Theorems & Definitions (12)

  • Definition 1: Metric Space
  • Definition 2: Problem of Data Indexing
  • Definition 3: Generalized Hyperplane Partitioning
  • Definition 4: $k$-Nearest Neighbor Query
  • Definition 5: Hyperball
  • Definition 6: $\epsilon$-Neighborhood of an Object
  • Definition 7
  • Definition 8: Volume of a Hyperball
  • Definition 9: Volume of Hyperspherical Cap
  • Definition 10: Overlapping Distance Rate
  • ...and 2 more