Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures
Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette
TL;DR
The paper tackles scalable $k$-NN search over massive IoT data by addressing overlap in tree-based partitions. It introduces three overlap-quantifying heuristics—Volume-Based Method ($VBM$), Distance-Based Method ($DBM$), and Object-Based Method ($OBM$)—and a preprocessing step using DBSCAN to enable overlap-aware index construction. Through simulations on Tracking and WARD datasets, $VBM$ consistently delivers balanced structures and fastest search performance, outperforming the baseline BCCF-tree in construction cost and query efficiency. The work advances real-time IoT data indexing by enabling scalable, low-overhead $k$-NN retrieval with adaptive overlap management.
Abstract
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.
