Table of Contents
Fetching ...

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

Daichi Amagata, Junya Yamada, Yuchen Ji, Takahiro Hara

TL;DR

This paper tackles the problem of processing top-k weighted stabbing queries on static weighted intervals. It introduces two exact algorithms: Interval Forest, which achieves $O(\sqrt{n}\log n + k)$ query time with $O(n)$ space, and a Segment Tree variant ST-PSA, which achieves $O(\log n + k)$ query time with $O(n\log n\log\log n)$ preprocessing and $O(n\log^2 n)$ space. The authors prove theoretical guarantees and validate them through experiments on two large real datasets, demonstrating clear speedups over the prior state of the art. The results have practical relevance for large-scale interval data in domains such as finance and transportation, and point to future work on dynamic intervals and continuous top-k queries.

Abstract

Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size $k$, this problem finds the $k$ intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs $O(n\log k)$ time, where $n$ is the number of intervals, so it is not scalable to large $n$. We solve this inefficiency issue and propose an algorithm that runs in $O(\sqrt{n }\log n + k)$ time. Furthermore, we propose an $O(\log n + k)$ algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

TL;DR

This paper tackles the problem of processing top-k weighted stabbing queries on static weighted intervals. It introduces two exact algorithms: Interval Forest, which achieves query time with space, and a Segment Tree variant ST-PSA, which achieves query time with preprocessing and space. The authors prove theoretical guarantees and validate them through experiments on two large real datasets, demonstrating clear speedups over the prior state of the art. The results have practical relevance for large-scale interval data in domains such as finance and transportation, and point to future work on dynamic intervals and continuous top-k queries.

Abstract

Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size , this problem finds the intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs time, where is the number of intervals, so it is not scalable to large . We solve this inefficiency issue and propose an algorithm that runs in time. Furthermore, we propose an algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.
Paper Structure (19 sections, 9 theorems, 1 equation, 5 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 9 theorems, 1 equation, 5 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

An interval tree can be built in $O(n\log n)$ time, consumes $O(n)$ space, and processes a stabbing query in $O(\log n + m)$ time, where $m$ is the number of stabbed intervals.

Figures (5)

  • Figure 1: Example of the interval and segment tree structures. The red line represents a simple stabbing query $s$, and the traversed path is blue. Note that $x_{3}$ and $x_{6}$ are stabbed by the query.
  • Figure 2: Pre-processing time [sec] vs. dataset size
  • Figure 3: Memory usage [MB] vs. dataset size
  • Figure 4: Running time vs. $k$
  • Figure 5: Running time vs. data size

Theorems & Definitions (12)

  • Definition 1: Stabbing query
  • Definition 2: Top-k weighted stabbing query
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Corollary 1
  • Lemma 3
  • Theorem 2
  • Example 1
  • Lemma 4
  • ...and 2 more